Methods of analyzing polymers using ordered label strategies

ABSTRACT

The invention relates to methods and products for analyzing polymers. The polymers are analyzed by reconstructing sequence information from population data sets. The data sets include information about polymer dependent impulses arising from the polymers. The invention is also a method for linearly analyzing polymers by assessing the intensity of a signal arising from the polymer. The signal is generated as units and/or units specific markers pass a fixed station. The quantitative intensity of the signal is proportional to the number of units and/or unit specific markers giving rise to the signal.

RELATED APPLICATIONS

[0001] This application claims priority to U.S. Provisional PatentApplication No. 60/096,666, filed Aug. 13, 1998 and No. 60/096,662,filed Aug. 13, 1998 and is a continuation in part of U.S. patent Ser.No. 09/134,411 filed on Aug. 13, 1998, currently pending, which is acontinuation of PCT/US98/03024 filed on Feb. 11, 1998, which claimspriority to U.S. Provisional Patent Application No. 60/064,687, filedMay 5, 1997 and No. 60/037,921, filed Feb. 11, 1997 the entire contentsof which are hereby incorporated by reference.

FIELD OF THE INVENTION

[0002] The present invention relates to methods and products foranalyzing polymers. In particular, the methods are based on generationof information from a data set of polymer dependent impulses arisingfrom polymers which have been labeled according to an ordered strategy.The information generated relates to many aspects of the polymer such asthe length of the polymer, the composition of units within the polymer,the order of units in the polymer, and the sequence or partial sequenceof units in the polymer. The invention also relates to methods forintensity based analysis.

BACKGROUND OF THE INVENTION

[0003] Polymers are involved in diverse and essential functions inliving systems. The ability to decipher the function of polymers inthese systems is integral to the understanding of the role that thepolymer plays within a cell. Often the function of a polymer in a livingsystem is determined by analyzing the structure and determining therelation between the structure and the function of the polymer. Bydetermining the primary sequence in a polymer such as a nucleic acid itis possible to generate expression maps, to determine what proteins areexpressed, and to understand where mutations occur in a disease state.Because of the wealth of knowledge that may be obtained from sequencingof polymers many methods have been developed to achieve more rapid andmore accurate sequencing methods.

[0004] In general DNA sequencing is currently performed using one of twomethods. The first and more popular method is the dideoxy chaintermination method described by Sanger et al. (1977). This methodinvolves the enzymatic synthesis of DNA molecules terminating indideoxynucleotides. By using the four ddNTPs, a population of moleculesterminating at each position of the target DNA can be synthesized.Subsequent analysis yields information on the length of the DNAmolecules and the base at which each molecule terminates (either A, C,G, or T). With this information, the DNA sequence can be determined. Thesecond method is Maxam and Gilbert sequencing (Maxam and Gilbert, 1977),which uses chemical degradation to generate a population of moleculesdegraded at certain positions of the target DNA. With knowledge of thecleavage specificities of the chemical reactions and the lengths of thefragments, the DNA sequence is generated. Both methods rely onpolyacrylamide gel electrophoresis and photographic visualization of theradioactive DNA fragments. Each process takes about 1-3 days. The Sangersequencing reactions can only generate 300-800 bases in one run.

[0005] Methods to improve the output of sequence information using theSanger method also have been proposed. These Sanger-based methodsinclude multiplex sequencing, capillary gel electrophoresis, andautomated gel electrophoresis. Recently, there has also been increasinginterest in developing Sanger independent methods as well. Sangerindependent methods use a completely different methodology to realizethe base information. This category includes scanning electronmicroscopy (STM), mass spectrometry, enzymatic luminometric inorganicpyrophosphate detection assay (ELIDA) sequencing, exonucleasesequencing, and sequencing by hybridization.

[0006] Further, several new methods have been described for carboxyterminal sequencing of polypeptides. See Inglis, A. S., Anal. Biochem.195:183-96 (1991). Carboxy terminal sequencing methods mimic Edmandegradation but involve sequential degradation from the opposite end ofthe polymer. See Inglis, A. S., Anal. Biochem. 195:183-96 (1991). LikeEdman degradation, the carboxy-terminal sequencing methods involvechemically induced sequential removal and identification of the terminalamino acid residue.

[0007] More recently, polypeptide sequencing has been described bypreparing a nested set (sequence defining set) of polymer fragmentsfollowed by mass analysis. See Chait, B. T. et al., Science 257:1885-94(1992). Sequence is determined by comparing the relative mass differencebetween fragments with the known masses of the amino acid residues.Though formation of a nested (sequence defining) set of polymerfragments is a requirement of DNA sequencing, this method differssubstantially from the conventional protein sequencing method consistingof sequential removal and identification of each residue. Although thismethod has potential in practice it has encountered several problems andhas not been demonstrated to be an effective method.

SUMMARY OF THE INVENTION

[0008] The present invention relates in some aspects to methods andproducts for analyzing polymers. In particular the invention in oneaspect is a method for identifying information about a polymer such asits sequence, length, order of bases etc., by obtaining polymerdependent impulses from a population of polymers and comparing thepolymer dependent impulses to determine unit specific information aboutthe polymers.

[0009] Recently, methods for analyzing polymers based on unit specificinformation about the polymer have been developed. Such methods aredescribed in co-pending PCT patent application No. PCT/US98/03024 andU.S. Ser. No. 09/134,411 filed Aug. 13, 1998, the entire contents ofwhich are hereby incorporated by reference. The method for analyzingpolymers described in PCT/US98/03024 and Ser. No. 09/134,411 is based onthe ability to examine each unit or unit specific marker of a polymerindividually. By examining each unit or unit specific markerindividually the type of units and the position of the units on thebackbone of the polymer can be identified. This can be accomplished bypositioning a labeled unit or unit specific marker at a station andexamining a change which occurs when that labeled unit or unit specificmarker is proximate to the station. The change can arise as a result ofan interaction that occurs between the unit or unit specific marker andthe station or a partner and is specific for the particular unit or unitspecific marker. For instance if the polymer is a nucleic acid moleculeand a T is positioned in proximity to a station a change which isspecific for a T could occur. If on the other hand, a G is positioned inproximity to a station then a change which is specific for a G couldoccur. The specific change which occurs, for example, depends on thestation used, the type of polymer being studied and/or the label used.For instance the change may be an electromagnetic signal which arises asa result of the interaction.

[0010] Methods for analyzing polymers based on unit specific informationabout the polymer involves the detection of polymer dependent impulsesfrom a plurality of polymers to produce a data set of information. Thedata set can be compared to provide specific information about thepolymer such as the composition of units in the polymer, the length ofthe polymer, the presence of specific sequences in the polymer, and eventhe entire sequence of the units in the polymer.

[0011] In one aspect the invention is a method for generating unitspecific information about a polymer. The method includes the steps ofobtaining polymer dependent impulses for a plurality of labeledpolymers, comparing the polymer dependent impulses obtained from each ofthe plurality of labeled polymers, determining unit specific informationabout the polymers based upon comparing the polymer dependent impulses.Preferably the polymer dependent impulses arise from unit specificmarkers of less than all units of the polymers. In an embodiment thepolymer dependent impulses arise from at least two unit specific markersof the polymers.

[0012] The plurality of polymers may be any type of polymer butpreferably is a nucleic acid. In one embodiment the plurality ofpolymers is a homogenous population. In another embodiment the pluralityof polymers is a heterogenous population. The polymers can be labeled,randomly or non randomly. Different labels can be used to labeldifferent linked units to produce different polymer dependent impulses.

[0013] The polymer dependent impulses provide many different types ofstructural information about the polymer. For instance the obtainedpolymer dependent impulses may include an order of polymer dependentimpulses or the obtained polymer dependent impulses may include the timeof separation between specific signals or the number of specific polymerdependent impulses. The obtained polymer dependent impulses may indicatethe sequence of units of the polymer.

[0014] In one important embodiment the polymer dependent impulses areobtained by moving the plurality of polymers linearly past a signalgeneration station.

[0015] According to another embodiment the unit specific markers arenucleic acid probes. In another embodiment the unit specific markers arepeptide nucleic acid probes.

[0016] The unit specific markers may identify a single unit of a polymeror multiple units of a polymer. When the polymer is a nucleic acid theunit specific marker may be a nucleic acid probe. In one embodiment theunit specific marker is a nucleic acid probe having at least two basepairs. In another embodiment the unit specific marker is a nucleic acidprobe having at least three base pairs.

[0017] According to another aspect of the invention a method forsequencing a polymer of linked units is provided. The method includesthe steps of obtaining polymer dependent impulses from a plurality ofoverlapping polymers, at least a portion of each of the polymers havinga sequence of linked units identical to the other of the polymers, andcomparing the polymer dependent impulses from an overlapping portion ofeach of the plurality of polymers to obtain a sequence of linked unitswhich is identical in the plurality of polymers.

[0018] The polymer dependent impulses may be detected by many means. Apreferred method of detection is optical detection.

[0019] The plurality of polymers may be any type of polymer butpreferably is a nucleic acid. Preferably the nucleic acids are labeledwith an agent selected from the group consisting of an electromagneticradiation source, a quenching source and a fluorescence excitationsource. In one embodiment the plurality of polymers is a homogenouspopulation. In another embodiment the plurality of polymers is aheterogenous population. The polymers can be labeled, randomly or nonrandomly. Different labels can be used to label different linked unitsto produce different polymer dependent impulses.

[0020] The polymer dependent impulses provide many different types ofstructural information about the polymer. For instance the obtainedpolymer dependent impulses may include an order of polymer dependentimpulses or the obtained polymer dependent impulses may include the timeof separation between specific signals or the number of specific polymerdependent impulses. The obtained polymer dependent impulses may indicatethe sequence of units of the polymer.

[0021] In one important embodiment the polymer dependent impulses areobtained by moving the plurality of polymers linearly past a signalgeneration station.

[0022] According to another embodiment the unit specific marker is anucleic acid probe. In another embodiment the unit specific markers is apeptide nucleic acid probe. In another embodiment, the unit specificmarker is a peptide.

[0023] The unit specific markers may identify a single unit of a polymeror multiple units of a polymer. When the polymer is a nucleic acid theunit specific marker may be a nucleic acid probe. In one embodiment theunit specific marker is a nucleic acid probe having at least three basepairs. In another embodiment the unit specific markers are three basepair nucleic acid probes.

[0024] The invention in another aspect is a kit for labeling polymers.The kit includes a container housing a series of distinct nucleic acidprobes; wherein the series of nucleic acid probes is a set of multiplebase pair probes. Preferably the multiple base pair probes are selectedfrom the group consisting of two base pair probes, three base pairprobes, four base pair probes, and five base pair probes.

[0025] In one embodiment the container is a single container having aplurality of compartments, each housing a specific labeled probe. Inanother embodiment the container is a plurality of containers.

[0026] The kit in one embodiment also includes instructions for labelingthe nucleic acid probes.

[0027] The distinct nucleic acid probes are labeled in one embodiment.Preferably the nucleic acid probes are labeled with an agent selectedfrom the group consisting of an electromagnetic radiation source, aquenching source and a fluorescence excitation source. In one embodimentthe plurality of polymers is a homogenous population. In anotherembodiment the distinct nucleic acid probes are three base pair probes.In another embodiment the distinct nucleic acid probes are four basepair probes. In yet another embodiment the distinct nucleic acid probesare five base pair probes.

[0028] The invention in other aspects relates to methods and productsfor linear analysis of polymers using an intensity based method foridentifying information about the polymer such as its sequence, length,order of bases etc. The methods can be accomplished using intensitybased measurements combined with the ordered labeling strategy discussedabove.

[0029] One aspect of linear analysis involves the movement of thepolymer past a fixed station in such a manner as to cause a signal thatprovides information about the polymer to arise. According to an aspectof the invention it was discovered that information about the polymercan be determined by quantitatively measuring intensity of the signalarising at the station. The signal arises from the polymer as a resultof the units of the polymer passing the fixed station. In some cases allof the units may cause the generation of a signal and in other casesless than all of the units produce the signal. The total intensity ofthe signal is proportional to the number of units or unit specificmarkers which generate a signal as they pass the fixed station. If thesignal arises from every unit of the polymer then the intensity of thesignal is proportional to the number of units in the polymer. If thesignal arises from less than all of the units or unit specific markersof the polymer then the intensity of the signal is proportional to thatnumber of units or unit specific markers causing generation of thesignal. The number of units or unit specific markers indicated by theintensity can be used to determine information about the polymer such asthe composition of units in the polymer, the length of the polymer, thepresence of specific sequences in the polymer, and even the entiresequence of the units in the polymer.

[0030] The invention in another aspect is a method for analyzing apolymer by linearly moving a labeled polymer with respect to a fixedstation, obtaining a signal from the labeled polymer as the labeledpolymer passes the fixed station, wherein the signal is anelectromagnetic radiation signal arising from an interaction between atleast two distinct labeled unit specific markers and determining aquantitative measure of intensity of the signal to analyze the polymer.

[0031] The intensity of the signal provides various types of structuralinformation about a polymer, depending on how the polymer is labeled. Inone embodiment each unit of the labeled polymer is labeled with a unitspecific marker and the quantitative measure of intensity of the signalindicates the length of the polymer. In another embodiment less than allunits of the polymer are labeled with at least one unit specific markerand the quantitative measure of intensity of the signal indicates thenumber of labeled unit specific markers present in the polymer.

[0032] The fixed station which gives rise to the signal when the labeledpolymer interacts with the station in one embodiment is anelectromagnetic radiation source. In another embodiment the fixedstation is a radiation source.

[0033] More than one polymer may be analyzed to generate a data setrepresentative of a population of polymers. Thus in one embodiment aplurality of polymers are analyzed simultaneously to produce a pluralityof signals, one signal for each polymer, and further comprising the stepof comparing the intensities of the signals to analyze the polymers.

[0034] The labeled polymer may be labeled with a unit specific marker.In one embodiment the unit specific marker is a peptide nucleic acidprobe. In another embodiment the unit specific marker is a series ofdistinct nucleic acid probes selected from the group consisting of twobase pair probes, three base pair probes, four base pair probes, andfive base pair probes. In yet another embodiment the unit specificmarker is a fluorescent probe.

[0035] According to another embodiment the labeled polymer is labeledwith a plurality of unit specific markers, wherein at least one unitspecific marker includes a fluorophore which emits light at a firstwavelength and at least one unit specific marker which includes afluorophore which emits light at a second wavelength. In anotherembodiment the at least one unit specific marker which includes thefluorophore which emits light at the first wavelength is attached to endunits of the polymer and wherein the at least one unit specific markerwhich emits light at the second wavelength is attached to an internalunit of the polymer.

[0036] Each of the limitations of the invention can encompass variousembodiments of the invention. It is, therefore, anticipated that each ofthe limitations of the invention involving any one element orcombinations of elements can be included in each aspect of theinvention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0037]FIG. 1 shows a schematic of a random labeling method in which twodifferently labeled DNA samples are used.

[0038]FIG. 2 is a graph of raw data demonstrating changes in energyemission patterns to determine distance information through theinstantaneous rate method. The changes in energy patterns result fromsequential detectable signals which when plotted produce a curve thatfrom left to right shows two energy intensity decreases, followed by twoenergy intensity increases. The rate is 6.8 A/s and t₁ is the timebetween the entry of the first and second labels.

[0039]FIG. 3 shows a schematic of a random labeling method using a onenucleotide labeling scheme where less than all of the one nucleotide arelabeled.

[0040]FIG. 4 shows a schematic of a random labeling method using a twonucleotide labeling scheme.

[0041]FIG. 5 shows a schematic of a random labeling method using twodifferently labeled nucleic acid samples.

[0042]FIG. 6 illustrates the generation of sequence information from thesorted data.

[0043]FIG. 7 illustrates the labeling of unit specific markers with morethan one type of label.

[0044]FIG. 8 shows a schematic of a random labeling method using tripletunit specific markers one of which is kept constant during the analysis.

[0045]FIG. 9 shows a schematic of a random labeling method of a doublestranded nucleic acid analysis using direction specific labels.

[0046]FIG. 10 is a sample kit according to the invention.

[0047]FIG. 11A is a schematic representation of a labeled DNA moleculemoving through a nanochannel plate and 11B is a fluorescenceillumination graph from which intensity can be determined.

[0048]FIG. 12A is a schematic diagram of a waveguide structure with alabeled DNA molecule passing through it and 12B is similar to 11B.

[0049]FIG. 13 is a schematic diagram of a hexagonally packed beadnanostructure for analyzing polymers.

BRIEF DESCRIPTION OF THE SEQUENCES

[0050] SEQ. ID. NO. 1 is a hypothetical nucleic acid sequence.

[0051] SEQ. ID. NO. 2 is a hypothetical nucleic acid sequence.

[0052] SEQ. ID. NO. 3 is a hypothetical nucleic acid sequence.

DETAILED DESCRIPTION OF THE INVENTION

[0053] The invention is a method for analyzing polymers based on acompilation of data obtained from incomplete labeling of the polymers.The methods can be performed using data generated from single unitlabels or multiple unit labels (both referred to herein as unit specificmarkers), single stranded polymers, double stranded polymers, orcombinations thereof.

[0054] One advantage of the invention is that the method provides arational means of deciphering incomplete labeling schemes intoinformation, e.g., sequence information about the polymer, withoutrequiring the labeling of each unit within a polymer. There are certainphysical limits to labeling polymers such as DNA, which make it verydifficult to completely label every nucleotide in a strand of DNA. Forinstance, in a single strand of DNA, the replacement of the native basesby fluorescently labeled bases is hindered by the major groove andbase-to-base steric interactions of fluorophores derivatized to adjacentbases. Stack hindrance problems have made common methods of DNA analysissuch as exonuclease sequencing difficult to perform. The methods of theinvention provide the ability to decipher incomplete labeling schemesthrough various labeling methods to generate sequence information aboutpolymers rapidly and accurately. The methods of the invention alsoprovide enhanced resolution over prior art sequencing methods.

[0055] In one aspect the invention is a method for generating unitspecific information about a polymer. The method includes the steps ofobtaining polymer dependent impulses for a plurality of labeledpolymers, comparing the polymer dependent impulses of the plurality oflabeled polymers, determining unit specific information about thepolymers based upon the polymer dependent impulses. Preferably thepolymer dependent impulses arise from unit specific markers of less thanall units of the polymers.

[0056] As used herein the term “unit specific information” refers to anystructural information about one, some, or all of the units of thepolymer. The structural information obtained by analyzing a polymeraccording to the methods of the invention may include the identificationof characteristic properties of the polymer which (in turn) allows, forexample, for the identification of the presence of a polymer in a sampleor a determination of the relatedness of polymers, identification of thesize of the polymer, identification of the proximity or distance betweentwo or more individual units or unit specific markers a polymer,identification of the order of two or more individual units or unitspecific markers within a polymer, and/or identification of the generalcomposition of the units or unit specific markers of the polymer. Sincethe structure and function of biological molecules are interdependent,the structural information can reveal important information about thefunction of the polymer.

[0057] The methods of the invention are performed by detecting signalsreferred to as polymer dependent impulses. A “polymer dependent impulse”as used herein is a detectable physical quantity which transmits orconveys information about the structural characteristics of a unitspecific marker of a polymer. The physical quantity may be in any formwhich is capable of being detected. For instance the physical quantitymay be electromagnetic radiation, chemical conductance, electricalconductance, etc. The polymer dependent impulse may arise from energytransfer, quenching, changes in conductance, radioactivity, mechanicalchanges, resistance changes, or any other physical changes. Although thepolymer dependent impulse is specific for a particular unit specificmarker, a polymer having more than one of a particular labeled unitspecific marker will have more than one identical polymer dependentimpulse. Additionally, each unit specific marker of a specific type maygive rise to different polymer dependent impulses if they have differentlabels. In some embodiments when intensity of a signal is beingmeasured, the polymer dependent impulse is an optical signal

[0058] The method used for detecting the polymer dependent impulsedepends on the type of physical quantity generated. For instance if thephysical quantity is electromagnetic radiation then the polymerdependent impulse is optically detected. An “optically detectable”polymer dependent impulse as used herein is a light based signal in theform of electromagnetic radiation which can be detected by lightdetecting imaging systems. In some embodiments the intensity of thissignal is measured. When the physical quantity is chemical conductancethen the polymer dependent impulse is chemically detected. A “chemicallydetected” polymer dependent impulse is a signal in the form of a changein chemical concentration or charge such as an ion conductance which canbe detected by standard means for measuring chemical conductance. If thephysical quantity is an electrical signal then the polymer dependentimpulse is in the form of a change in resistance or capacitance.

[0059] A “plurality of polymers” is at least two polymers. A pluralityof polymers in one embodiment is at least 50 polymers and in anotherembodiment is at least 100 polymers.

[0060] The polymer dependent impulses may provide any type of structuralinformation about the polymer. For instance these signals may providethe entire or portions of the entire sequence of the polymer, the orderof polymer dependent impulses, or the time of separation between polymerdependent impulses as an indication of the distance between the units orunit specific markers.

[0061] The polymer dependent impulse arises from a detectable physicalchange in the unit specific marker of the polymer or the station (orenvironment surrounding the station). As used herein a “detectablephysical change” in the unit specific marker of the polymer or thestation is any type of change which occurs in the unit specific markerof the polymer or the station as a result of exposing the unit specificmarker to the station. Once the unit specific marker is exposed to thestation a detectable signal or polymer dependent impulse is created. Thestation may be for instance, an interaction station or a signalgeneration station. The type of change that occurs in the station or theunit specific marker to produce the detectable signal or polymerdependent impulse depends on the type of station and the type of unitspecific marker. Several examples of station-unit specific markercombinations which undergo a change to produce a detectable signal arediscussed herein for exemplary purposes. Those of skill in the art willbe able to derive other station-unit specific marker combinations thatfall within the scope of the invention.

[0062] The polymer dependent impulses are obtained by interaction whichoccurs between the unit specific marker of the polymer and a signalgeneration station or the environment at a signal generation station. A“signal generation station” as used herein is a station that is an areawhere the unit specific marker interacts with the station or theenvironment to generate a polymer dependent impulse. In some aspects ofthe invention the polymer dependent impulse results from contact in adefined area with an agent selected from the group consisting ofelectromagnetic radiation, a quenching source, and a fluorescenceexcitation source which can interact with the unit specific marker toproduce a detectable signal or polymer dependent impulse. In otheraspects the polymer dependent impulse results from contact in a definedarea with a chemical environment which is capable of undergoing specificchanges in conductance in response to an interaction with a molecule. Asa molecule with a specific structure interacts with the chemicalenvironment a change in conductance occurs. The change which is specificfor the particular structure may be a temporal change, e.g., the lengthof time required for the conductance to change may be indicative thatthe interaction involves a specific structure or a physical change. Forinstance, the change in intensity of the interaction may be indicativeof an interaction with a specific structure. In other aspects thepolymer dependent impulse results from changes in capacitance orresistance caused by the movement of the unit specific marker betweenmicroelectrodes or nanoelectrodes positioned adjacent to the polymerunit specific marker. For instance the signal generation station mayinclude microelectrodes or nanoelectrodes positioned on opposite sidesof the polymer unit specific marker. The changes in resistance orconductance which occur as a result of the movement of the unit specificmarker past the electrodes will be specific for the particular unitspecific marker.

[0063] The invention also relates to a method of analyzing polymersusing linear analysis to generate an optical signal, wherein theintensity of the optical signal provides information about the polymerto analyze the polymer. The generated optical signal may be any type ofelectromagnetic radiation signal for which intensity can be determined(e.g., fluorescence, radiation, etc.).

[0064] As used herein “similar polymers” are polymers which have atleast one overlapping region. Similar polymers may be a homogeneouspopulation of polymers or a heterogenous population of polymers. A“homogeneous population” of polymers as used herein is a group ofidentical polymers. A “heterogenous population” of similar polymers is agroup of similar polymers which are not identical but which include atleast one overlapping region of identical units. An overlapping regiontypically consists of at least 10 contiguous nucleotides. In some casesan overlapping region consists of at least 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, or 22 contiguous nucleotides.

[0065] A “plurality of labeled polymers” refers to two or more similarpolymers which are labeled intrinsically or extrinsically. Preferably aplurality of similar polymers is 50 or more similar polymers. Morepreferably a plurality of similar polymers is 100 or more similarpolymers.

[0066] A “data set” as used herein is a set of information defining thepolymer dependent impulses generated by similar polymers. The data setis analyzed as discussed above and the method of analysis used dependson the type of labeling scheme used to generate the labeled polymers.

[0067] Nucleic acid sequencing is a particularly preferred embodiment ofthe methods of the invention. Currently, less than 5% of the humangenome has been sequenced. This translates into a small fraction of theideal in human sequence knowledge, which is the sequence of allindividuals. For instance, for the human population, there are 1.4×10¹⁹(5 billion people×3×10⁹ bases/person). So far, only 2×10⁻¹⁰ percent ofall human genetic information is known. The rate of sequencing of thehuman genome by all world-wide efforts is roughly 3×10⁹/15 years, or550,000 bases/day, at a cost of >$1/base. Sequencing by the methods ofthe invention described herein will constitute an inordinatebreakthrough in the rate of sequencing. The predicted time to completeone human genome with one machine is ˜15 hours. Several dynamic arraysin parallel will be able to complete the sequence of one human genome ina fraction of an hour.

[0068] A method for sequencing a polymer of linked units is alsoencompassed by the invention. The method is performed by obtainingpolymer dependent impulses from each of a plurality of overlappingpolymers, at least a portion of each of the polymers having a sequenceof linked units identical to the other of the polymers, and comparingthe polymer dependent impulses to obtain a sequence of linked unitswhich is identical in the plurality of polymers.

[0069] The plurality of overlapping polymers is a set of polymers inwhich each polymer has at least a portion of its sequence of linkedunits which is identical to the other polymers. The portion of sequencewhich is identical is referred to as the overlapping region and whichincludes at least ten contiguous units.

[0070] A “polymer” as used herein is a compound having a linear backboneof individual units which are linked together by linkages. In some casesthe backbone of the polymer may be branched. Preferably the backbone isunbranched. The term “backbone” is given its usual meaning in the fieldof polymer chemistry. The polymers may be heterogeneous in backbonecomposition thereby containing any possible combination of polymer unitslinked together such as peptide-nucleic acids (which have amino acidslinked to nucleic acids and have enhanced stability). In a preferredembodiment the polymers are homogeneous in backbone composition and are,for example, nucleic acids, polypeptides, polysaccharides,carbohydrates, polyurethanes, polycarbonates, polyureas,polyethyleneimines, polyarylene sulfides, polysiloxanes, polyimides,polyacetates, polyamides, polyesters, or polythioesters. In the mostpreferred embodiments, the polymer is a nucleic acid or a polypeptide. A“nucleic acid” as used herein is a biopolymer comprised of nucleotides,such as deoxyribose nucleic acid (DNA) or ribose nucleic acid (RNA). Apolypeptide as used herein is a biopolymer comprised of linked aminoacids.

[0071] As used herein with respect to linked units of a polymer,“linked” or “linkage” means two entities are bound to one another by anyphysicochemical means. Any linkage known to those of ordinary skill inthe art, covalent or non-covalent, is embraced. Such linkages are wellknown to those of ordinary skill in the art. Natural linkages, which arethose ordinarily found in nature connecting the individual units of aparticular polymer, are most common. Natural linkages include, forinstance, amide, ester and thioester linkages. The individual units of apolymer analyzed by the methods of the invention may be linked, however,by synthetic or modified linkages. Polymers where the units are linkedby covalent bonds will be most common but also include hydrogen bonded,etc.

[0072] The polymer is made up of a plurality of individual units. An“individual unit” as used herein is a building block or monomer whichcan be linked directly or indirectly to other building blocks ormonomers to form a polymer. The polymer preferably is a polymer of atleast two different linked units. The at least two different linkedunits may produce or be labeled to produce different signals, asdiscussed in greater detail below. The particular type of unit willdepend on the type of polymer. For instance DNA is a biopolymer composedof a deoxyribose phosphate backbone composed of units of purines andpyrimidines such as adenine, cytosine, guanine, thymine,5-methylcytosine, 2-aminopurine, 2-amino-6-chloropurine,2,6-diaminopurine, hypoxanthine, and other naturally and non-naturallyoccurring nucleobases, substituted and unsubstituted aromatic moieties.RNA is a biopolymer comprised of a ribose phosphate backbone composed ofunits of purines and pyrimidines such as those described for DNA butwherein uracil is substituted for thymidine. The DNA nucleotides may belinked to one another by their 5′ or 3′ hydroxyl group thereby formingan ester linkage. The RNA nucleotides may be linked to one another bytheir 5′, 3′ or 2′ hydroxyl group thereby forming an ester linkage.Alternatively, DNA or RNA units having a terminal 5′, 3′ or 2′ aminogroup may be linked to the other units of the polymer by the amino groupthereby forming an amide linkage.

[0073] Whenever a nucleic acid is represented by a sequence of lettersit will be understood that the nucleotides are in 5′→3′ order from leftto right and that “A” denotes adenosine, “C” denotes cytidine, “G”denotes guanosine, “T” denotes thymidine, and “U” denotes uracil unlessotherwise noted.

[0074] The polymers may be native or naturally-occurring polymers whichoccur in nature or non-naturally occurring polymers which do not existin nature. The polymers typically include at least a portion of anaturally occurring polymer. The polymers can be isolated or synthesizedde novo. For example, the polymers can be isolated from natural sourcese.g. purified, as by cleavage and gel separation or may be synthesizede.g.,(i) amplified in vitro by, for example, polymerase chain reaction(PCR); (ii) synthesized by, for example, chemical synthesis; (iii)recombinantly produced by cloning, etc.

[0075] The polymer or at least one unit specific marker thereof is in aform which is capable of interacting with an agent or station to producea signal (polymer dependent impulse) characteristic of that interaction.The unit specific marker of a polymer which is capable of undergoingsuch an interaction is said to be labeled. If a unit specific marker ofa polymer can undergo that interaction to produce a characteristicsignal, then the polymer is said to be intrinsically labeled. It is notnecessary that an extrinsic label be added to the polymer. If anon-native molecule, however, must be attached to the individual unitspecific marker of the polymer to generate the interaction producing thecharacteristic signal, then the polymer is said to be extrinsicallylabeled. The “label” may be, for example, light emitting, energyaccepting, fluorescent, radioactive, or quenching.

[0076] Many naturally occurring units of a polymer are light emittingcompounds or quenchers. For instance, nucleotides of native nucleic acidmolecules have distinct absorption spectra, e.g., A, G, T, C, and U haveabsorption maximums at 259 nm, 252 nm, 267 nm, 271 nm, and 258 nmrespectively. Modified units which include intrinsic labels may also beincorporated into polymers. A nucleic acid molecule may include, forexample, any of the following modified nucleotide units which have thecharacteristic energy emission patterns of a light emitting compound ora quenching compound: 2,4-dithiouracil, 2,4-Diselenouracil,hypoxanthine, mercaptopurine, 2-aminopurine, and selenopurine.

[0077] A unit or unit specific marker may also be considered to beintrinsically labeled when a property of the unit specific marker otherthan a light emitting, quenching or radioactive property providesinformation about the identity of the unit specific marker without theaddition of an extrinsic label. For instance the shape and charge of theunit specific marker provides information about the unit specific markerwhich can result in a specific characteristic signal, such as a changein conductance arising from the blockage of a conductance path by theunit.

[0078] The types of labels useful according to the methods of theinvention, guidelines for selecting the appropriate labels, and methodsfor adding extrinsic labels to polymers are provided in more detail inco-pending PCT patent application PCT/US98/03024 and U.S. Ser. No.09/134,411, which are incorporated by reference.

[0079] In addition to information about a specific unit the methods ofthe invention may be used to identify greater than one unit at a time inorder to provide information about a polymer. As discussed herein theinvention is useful for detecting polymer dependent impulses arisingfrom unit specific markers, which encompass single units as well asmultiple units. In one aspect the method is carried out by providing alabeled polymer of linked units, detecting signals from labeled unitspecific markers of less than all of the linked units, and storing asignature of the signals detected to analyze the polymer. In this aspectof the invention each unit of the labeled polymer may be labeled with aunit specific marker or less than all of the units may be labeled with aunit specific marker.

[0080] This method is particularly useful for analyzing multiple unitsof a polymer at one time. This is accomplished by using a unit specificmarker which is labeled and which interacts with more than one unit in asequence specific manner. As used herein a “unit specific marker” is acompound which specifically interacts with one or more units of apolymer and is capable of identifying those units. For instance a unitspecific marker for a nucleic acid molecule can be a labeled dimers,trimers, etc. which bind to a specific sequence of bases, such as TG,AG, ATC, etc. By identifying the presence or position of the labeledmarkers structural information about the polymer can be derived. Forinstance, the presence of the marker on a polymer can reveal theidentity of the polymer. This enables the presence or absence of apolymer in a solution or mixture of polymers to be determined. Theorder, distance, number etc. of the markers on a polymer can provideinformation about the sequence or composition of a polymer. Other unitspecific markers include but are not limited to sequence specific majorand minor groove binders and intercallators, sequence specific DNA andpeptide binding proteins, sequence specific peptide-nucleic acids, masslabels, fluorophores, antibodies or fragments thereof, restrictionenzymes, probes, doubly tagged nucleotides, etc. Many such unit specificmarkers exist and are well known to those of skill in the art.

[0081] A “labeled unit specific marker” as used herein is any unitspecific marker in a polymer that identifies a particular unit or units.A labeled unit specific marker includes, for instance, fluorescentmarkers which are bound to a particular unit or units, proteins,peptides, nucleic acids, polysaccharides, short oligomers, tRNA, etc.that recognize and bind to a particular unit or units and that can bedetected by e.g., possessing an intrinsically labeled property orincluding an extrinsic label or by binding to another detection moleculesuch as an antibody.

[0082] This type of analysis can be used in one embodiment to identifyDNA fragments by analyzing the hybridization patterns of multiple probesto individual fragments of polymers. The current state-of-the-artmethods for hybridization analysis of DNA rely upon DNA chips. Themethods of the invention are advantageous for a number of reasons. Thenumber, type, order, and distance between the multiple probes bound toan unknown fragment of DNA can be determined. This information can beused to identify the number of differentially expressed genesunambiguously. Current hybridization approaches can only determine thetype of probes bound to a given fragment. Furthermore, the methods ofthe invention are able to quantitate precisely the actual number ofparticular expressed genes. Current methods rely on quantitation offluorescence intensities, which often give rise to errors due tonon-linearities in the detection system. Given the great amount ofinformation generated, the methods of the invention do not require aselection of expressed genes or unknown nucleic acids to be assayed.This is in contrast to the requirement of different DNA chips fordifferent genes, sets of expressed genes to be analyzed, and alsodifferent organisms. The methods of the invention can identify theunknown expressed genes by computer analysis of the hybridizationpatterns generated. The data obtained from linear analysis of the DNAprobes are then matched with information in a database to determine theidentity of the target DNA. The methods can thus analyze informationfrom hybridization reactions, which can then be applied to diagnosticsand determination of gene expression patterns.

[0083] A “signature” as used herein is a sequence-specific signalarising from a labeled polymer. The signature includes information aboutthe structure of the polymer. For instance, the signature of a polymermay be defined by a series of consecutive unit specific markers or byspecific unit specific markers spaced a particular distance apart fromone another. The signature of the polymer identifies the polymer.Signatures are useful for uniquely identifying fragments by identifyingbases at certain positions along the length of a strand of DNA. Theprobability of knowing any one position is 1/4. Unambiguousidentification of a fragment comes with roughly twenty positionsidentified (1/4²⁰=9.1×120⁻¹³). For a fragment with 20 detected labelsand 10% detection/labeling, the size of the fragment needs to be only200 base pairs. The proposed read length is on the order of kilobases,which should unambiguously identify any fragment. The identification offragments allows for grouping by similar sequences, making sequencereconstruction by population analysis possible.

[0084] The data obtained from the polymer dependent impulses may bestored in a database, or in a data file, in the memory system of thecomputer. The data for each polymer may be stored in the memory systemso that it is accessible by the processor independently of the data forother polymers, for example by assigning a unique identifier to eachpolymer.

[0085] The information contained in the data and how it is analyzeddepends on the number and type of labeled unit specific markers thatwere caused to interact with the agent to generate signals. For instanceif every unit specific marker of a single polymer, each type of unitspecific marker (e.g., all the A's of a nucleic acid) having a specifictype of label, is labeled then it will be possible to determine fromanalysis of a single polymer the order of every unit specific markerwithin the polymer. If, however, only one of the four types of units ofa nucleic acid is labeled then more data will be required to determinethe complete sequence of the nucleic acid. Additionally, the method ofdata analysis will vary depending on whether the polymer is singlestranded or double stranded or otherwise complexed. Several labelingschemes and methods for analysis using the computer system data producedby those schemes are described in more detail below. The labelingstrategies are described with respect to nucleic acids for ease ofdiscussion. Each of these strategies, however, is useful for labelingall polymers.

[0086] Several different strategies of labeling are possible, involvingpermutations of different types of units labeled, different percentageof units labeled, and single-stranded or double-stranded labeling. Setforth below are examples of labeling strategies useful according to theinvention. The invention is, however, not limited to the exemplarydetails provided below. The labeling methods described herein and dataobtained from such methods are described with reference to DNA tosimplify the discussion. The invention, however, is not limited tomethods of analyzing DNA, but rather may be utilized with any type ofpolymer which is composed of individual monomeric units. It will beclear to those of ordinary skill in the art that when the descriptionbelow refers to DNA or nucleic acids, any polymer may be substituted,and when the description refers to a nucleotide, a base or specificallyA, C, T, or G, these terms may be substituted with the particularmonomeric units of the desired polymer. For instance, the polymer may bea peptide, and in that case the monomeric units is an amino acid. Thesimplest labeling scheme involves the labeling of all four nucleotideswith different labels. Labeling schemes in which three, two, or even oneunit are labeled, or wherein various combinations of units are labeledusing unit specific markers which span multiple nucleotides alsopossible.

[0087] A four nucleotide labeling scheme can be created where the A's,C's, G's, and T's of a target DNA is labeled with different labels. Sucha molecule, if moved linearly past a station, will generate a linearorder of signals which correspond to the linear sequence of nucleotideson the target DNA. The advantage of using a four nucleotide strategy isits ease of data interpretation and the fact that the entire sequence ofunit specific markers can be determined from a single labeled polymer.Adding extrinsic labels to all four bases, however, may cause sterichindrance problems. In order to reduce this problem the intrinsicproperties of some or all of the nucleotides may be used to label thenucleotides. As discussed above, nucleotides are intrinsically labeledbecause each of the purines and pyrimidines have distinct absorptionspectra properties. In each of the labeling schemes described herein thenucleotides may be either extrinsically or intrinsically labeled but itis preferred that at least some of the nucleotides are intrinsicallylabeled when the four nucleotide labeling method is used. It is alsopreferred that when extrinsic labels are used with the four nucleotidelabeling scheme that the labels be small and neutral in charge to reducesteric hindrance.

[0088] A three nucleotide labeling scheme in which three of the fournucleotides are labeled may also be performed. When only three of thefour nucleotides are labeled analysis of the data generated by themethods of the invention is more complicated than when all fournucleotides are labeled. The data is more complicated because the numberand position of the nucleotides of the fourth unlabeled type must bedetermined separately. One method for determining the number andposition of the fourth nucleotide utilizes analysis of two differentsets of labeled nucleic acid molecules. For instance, one nucleic acidmolecule may be labeled with A, C, and G, and another with C, G, and T.Analysis of the linear order of labeled nucleotides from the two setsyields sequence data. The three nucleotides chosen for each set can havemany different possibilities as long as the two sets contain all fourlabeled nucleotides. For example, the set ACG can be paired with a setof labeled CGT, ACT or AGT.

[0089] The sequence including the fourth nucleotide also may bedetermined by using only a single labeled polymer rather then a set ofat least two differently labeled polymers using a negative labelingstrategy to identify the position of the fourth nucleotide on thepolymer. Negative labeling involves the identification of sequenceinformation based on units which are not labeled. For instance, whenthree of the nucleotides of a nucleic acid molecule are labeled with alabel which provides a single type of signal, the points along thepolymer backbone which are not labeled must be due to the fourthnucleotide. This can be accomplished by determining the distance betweenlabeled nucleotides on a nucleic acid molecule. For example A, C, and Gare labeled and the detectable signals generated indicated that thenucleic acid molecule had a sequence of AGGCAAACG (SEQ. ID. No. 1). Ifthe distances between each of the nucleotides in the nucleic acidmolecule are equivalent to the known inter-nucleotide distance for aparticular combination of nucleotides except the distance between G andG is twice the normal inter-nucleotide distance then a T is positionedbetween the two G's and the entire molecule has a sequence of AGTGCAAACG(SEQ. ID. No. 2). The distance between nucleotides can be determined inseveral ways. Firstly, the polymer and the station may be moved relativeto one another in a linear manner and at a constant rate of speed suchthat a single unit specific marker of the nucleic acid molecule willpass the station at a single time interval. If two time intervals elapsebetween detectable signals then the unlabeled nucleotide which is notcapable of producing a detectable signal is present within thatposition. This method of determining the distance between unit specificmarkers is discussed in more detail below in reference to random onebase labeling. Alternatively the polymer and the station may be causedto interact with one another such that each unit specific markerinteracts simultaneously with a station to produce simultaneousdetectable signals. Each detectable signal generated occurs at the pointalong the polymer where the unit specific marker is positioned. Thedistance between the detectable signals can be calculated directly todetermine whether an unlabeled unit specific marker is positionedanywhere along the nucleic acid molecule.

[0090] Nucleic acid molecules may also be labeled according to a twonucleotide labeling scheme. Six sets of two nucleotide labeled nucleicacid molecule can be used to resolve the data and interpret thenucleotide sequence. Ambrose et al., 1993 and Harding and Keller, 1992have demonstrated the synthesis of large fluorescent DNA molecules withtwo of the nucleotides completely extrinsically labeled. The averagesize of the molecules were 7 kb. Six different combinations of twonucleotide labeling are possible using the following formula:$\left( {{}_{}^{}{}_{}^{}} \right) = {\frac{n!}{{k!}{\left( {n - k} \right)!}} = {\frac{4}{{2!}{2!}} = 6}}$

[0091] where n nucleotides are taken k at a time. The possiblecombinations are AC, AG, AT, CG, CT, and GT. Knowledge of the linearorder of the labels in each of the sets allows for successfulreconstruction of the nucleic acid sequence. Using a 4-mer (5′ACGT′3) asa model sequence, the theory can be demonstrated. The first set, AC,gives the information that there must be a C after the A. This does notgive information about the number of nucleotides intervening the A andthe C nor does it give information about any G's or T's preceding the A.The second set, AG, shows that there is also a G after the A. Set ATshows there is a T after the A. From these three sets, it is then knownthat the target DNA is a 4-mer and that one C, one G, and one T followthe A. The subsequent sets give information on the ordering of thesethree nucleotides following the A. Set CG shows that G follows C. Set CTshows that T follows C. Set GT finishes the arrangement to give thefinal deciphered sequence of 5′ACGT′3. In addition to the method usingsix labeled sets of nucleic acid molecules, the sequence can beestablished by combing information about the distance between labelednucleotides generating detectable signals as described above andinformation obtained from fewer than six sets of two nucleotide labelednucleic acid molecules.

[0092] A fourth labeling scheme, the random one nucleotide labelingscheme also may be used. In this method, distance information which isobtained by either population analysis and/or instantaneous rate of DNAmovement is used to determine the number of nucleotides separating twolabeled nucleotides. Analysis of four differently labeled targetmolecules yields the complete sequence.

[0093] One method of analysis with these labeling methods includes theuse of complementary base information. FIG. 1 demonstrates the labelingstrategy in which two differently labeled DNA samples are required. Thefirst sample has two of its non-complementary bases randomly labeledwith the same fluorophore. Non-complementary pairs of bases are AC, AG,TC, and TG. The second sample has one of its bases randomly labeled. Thebase chosen for the second sample can be any of the four bases. In theexample given, the two non-complementary bases are chosen to be A and C.As a result, two samples are prepared, one with labeled A's and C's andanother with labeled A's. The DNA can be, for example, genomicallydigested, end-labeled, purified, and analyzed by nanochannel FRETsequencing. The sequence-specific FRET information arising from eachfragment is sorted into one of two complementary strand groups. Sortingallows population analysis to determine the positions of all the desiredbases. The figure illustrates the generation of sequence informationfrom the sorted data. The first group of analyzed information yields thepositions of all the A's and C's on one strand. The second groupanalyzed yields knowledge of all the A's and C's on one strand. The sameprocedure is applied to the complementary stand. Knowledge of thecomplementary strand's A's and C's is identical to knowledge of the T'sand G's on the other stand. The result is sequence reconstruction. Tocross-verify the sequence, the process can be repeated for the otherpairs of non-complementary bases such as TG, TC and AG.

[0094] There are two methods of determining the distance between bases.One requires determining the instantaneous rate of DNA movement, whichis readily calculated from the duration of energy transfer or quenchingfor a particular label. Another involves analyzing a population oftarget DNA molecules and its corresponding Gaussian distancedistributions.

[0095] The instantaneous rate method, involves a determination ofdistance separation based on the known instantaneous rate of DNAmovement (v) multiplied by the time of separation between signals (t).Instantaneous rate is found by measuring the time that it takes for alabeled nucleotide to pass by the interaction station. Since the lengthof the concentrated area of agent (d) is known (through calibration andphysical measurement of the localized region of the agent, e.g., thethickness of a concentrated donor fluorophore area), the rate is simplyv=d/t. As shown in FIG. 2 analysis of raw data demonstrating changes inenergy emission patterns resulting from sequential detectable signalswhen plotted produces a curve which from left to right shows two energyintensity decreases, followed by two energy intensity increases. Theplateau from the first energy intensity decrease (denoted t₁) is doublethat of the second plateau (t₂). The length of the interaction stationis given as 51 Å. From this given information, the number of labelednucleotides is known. Furthermore, the distance of separation of the twois determined by relating the rate of DNA movement to the time of thedonor intensity plateaus.

[0096] The number of labeled nucleotides is simply denoted by the numberof intensity decreases. In FIG. 2, there are two intensity decreases.Accordingly, there must be two detectable labels on the DNA. Todetermine the distance of base separation, it is necessary to know theinstantaneous rate of DNA movement, which is found by knowing the timefor one labeled nucleotide to cross the localized region of the agentand the length of the localized region of the agent. The length of thelocalized region of the agent is given as 51 Å. The time for one labelednucleotides crossing the localized region of the agent is bounded by thefirst intensity decrease and the first intensity increase (denoted asthe gray shaded region, 7.5 s). The rate of DNA movement is 6.8 Å/s. Thebase separation is derived from the time separating the labelednucleotides (t₁=5 s) multiplied by the rate (6.8 Å), which is equal to10 base pairs. As a means of cross-verification, 51 Å−t₂v also yieldsthe base separation.

[0097] In the population method the entire population of labelednucleotide is considered. Knowledge of the length of the localizedregion of the agent and instantaneous rate, as required for the ratemethod, is not necessary. Use of population analyses statisticallyeliminates the need for precision measurements on individual nucleicacid molecules.

[0098] An example of population analyses using five nucleic acidmolecules each traversing a nanochannel is described below. Fivemolecules representing a population of identical DNA fragments areprepared. In a constant electric field, the time of detection betweenthe first and second labeled nucleotide should be identical for all theDNA molecules. Under experimental conditions, these times differslightly, leading to a Gaussian distribution of times. The peak of theGaussian distribution is characteristic of the distance of separation(d) between two labeled nucleotides.

[0099] An additional example utilizing a population of one nucleotiderandomly labeled nucleic acid molecule (six molecules represent thepopulation) further illustrates the concept of population analysis andthe determination of distance information. The nucleic acid isend-labeled to provide a reference point. With enough nucleic acidmolecules, the distance between any two A's can be determined. Twomolecules, when considered as a sub-population, convey the baseseparation molecules, distributions of 4 and 6 base separations arecreated. Extending the same logic to rest of the population, thepositions of all the A's on the DNA can be determined. The entiresequence is generated by repeating the process for the other three bases(C, G, and T).

[0100] In addition to labeling all of one type of unit specific markerin the above-described examples, it is possible to use various labelingschemes where not every nucleotide of the nucleotides or markers to belabeled is labeled. An outline of a one nucleotide labeling scheme whereless than all of the one nucleotide are labeled is shown in FIG. 3. Arepresentative population of random A-labeled fragments for a 16-merwith the sequence 5′ACGTACGTACGTACGT′3 (SEQ. ID. No. 3). Eachindividually labeled DNA molecule has half of its A's labeled inaddition to 5′ and 3′ end labels. With a large population of randomlylabeled fragments, the distance between every successive A on the targetDNA can be found. The end labels serve to identify the distance betweenthe ends of the DNA and the first A. Repeating the same analysis for theother nucleotides generates the sequence of the 16-mer by compiling thedata to identify the position of all of the As within that population ofnucleic acid molecules. These steps can then be repeated using unitspecific markers for the other nucleotides in the population of nucleicacids. The advantages of using such a method includes lack of stericeffects and ease of labeling. This type of labeling is referred to asrandom labeling. A polymer which is “randomly labeled” is one in whichfewer than all of a particular type of unit specific marker are labeled.It is unknown which unit specific markers of a particular type of arandomly labeled polymer are labeled.

[0101] A similar type of analysis may be performed by labeling each ofthe four nucleotides incompletely but simultaneously within apopulation. For instance, each of the four nucleotides may be partiallylabeled with its own unit specific marker which gives rise to adifferent physical characteristic, such as color, size, etc. This can beaccomplished to generate a data set containing information about all ofthe nucleotides from a single population analysis. For instance themethod may be accomplished by partially labeling two nucleotide pairs atone time. Two nucleotide labeling is possible through the lowering ofsteric hindrance effects by using unit specific markers which recognizethe two nucleotides of a nucleic acid strand and which contain a labelsuch as a single fluorescent molecule. Ambrose et al., 1993 and Hardingand Keller, 1992 have demonstrated that large fluorescent nucleic acidmolecules with two of the nucleotides completely labeled are possible toachieve. The average size of the molecules studied were 7 KB. FIG. 4demonstrates a two nucleotide labeling scheme. Partial labeling of threenucleotides is also possible. For instance, each of three nucleotides ispartially labeled with a different unit specific marker. In this case, apopulation of single stranded nucleic acid molecules which are partiallylabeled with three specific nucleotide pair combinations is generatedand can be analyzed.

[0102] The methods of the invention can also be achieved using a doublestranded nucleic acid. In a double stranded nucleic acid, when a singlenucleotide on two of the strands is labeled, information about twonucleotides becomes available for each of the strands. For instance, inthe random and partial labeling of A's, knowledge about the A's and T'sbecomes available. FIG. 5 demonstrates a labeling strategy in which twodifferently labeled nucleic acid samples are prepared. The first samplehas two non-complimentary nucleotides randomly labeled with the samefluorophore. Non-complimentary pairs of nucleotides are AC, AG, TC, andTG. The second sample has one of its nucleotides randomly labeled. Thenucleotide chosen for the second sample may be any one of the fournucleotides. In the example provided, the two non-complimentarynucleotides are chosen to be A and C, and the single nucleotide ischosen to be A. Two samples are prepared, one with labeled A's and C'sand another with labeled A's. The nucleic acid is genomically digested,end labeled, purified, and analyzed. Such procedures are well-known tothose of ordinary skill in the art. The information from each fragmentis sorted into one of two complimentary strand groups. Sorting theinformation allows the population analysis to determine the positions ofall the desired nucleotides. FIG. 6 illustrates the generation ofsequence information from the sorted data. The first group of dataprovides known positions of all the A's and C's on one strand. Thesecond group of data provides known positions of all of the A's. Thecombination of these two data sets reveals the position of all of theA's and C's on one strand. The same procedure may be applied to thecomplimentary strand to determine the positions of the A's and C's onthat strand. The resultant data reveals the entire sequence for bothstrands of the nucleic acid, based on the assumption that the strandincludes the complimentary nucleotide pairs of A and C (A:T and C:G). Tocross-verify the sequence, the process can be repeated for the otherpairs of non-complimentary nucleotides such as TG, TC and AG.

[0103] A single-stranded two-nucleotide labeling scheme also can beperformed on double stranded DNA when two of the nucleotides on onestrand of DNA are fully replaced by labeled nucleotides. To reduce thesteric constraints imposed by two extrinsically labeled nucleotideswhile preserving the theory behind two-nucleotide labeling, it ispossible to label one nucleotide fully on each of the complementarystrands to achieve the same end. This method involves usingdouble-stranded DNA in which each strand is labeled with a differentlabel. Six differently labeled duplex DNA sets will produce a data setwhich is adequate to provide sequence information. Each complementarystrand of DNA should have one of the nucleotides labeled. In each of theduplex DNA sets, the equivalent of two different nucleotides (possiblecombinations are AC, AG, AT, CG, CT, GT) are labeled. When bothcomplementary strands have the adenines labeled, this is equivalent tothe combination AT. In duplex two-nucleotide labeling, the advantage isthat only one nucleotide on each strand is labeled, allowing longerlabeled strands to be synthesized as compared to two-nucleotide labelingon single-stranded DNA. In practice, it has been shown that synthesis ofDNA fragments with one nucleotide completely labeled can be achievedwith lengths much greater than 10 kb (Ambrose et al., 1993; Harding andKeller, 1992).

[0104] By including more than one physical characteristic into thelabel, the simultaneous and overlapping reading of the nucleic acidwithin the same temporal frame may provide more accurate and rapidinformation about the positions of the labeled nucleotides than whenonly a single physical characteristic is included. label with differentwavelength fluorophores. For instance, as demonstrated in FIG. 7, eachof the nucleotides can include a double or triple Each of thefluorophores can be detected separately to provide distinct readingsform the same sample.

[0105] In addition to the various combinations of single nucleotidelabeling methods, two or more adjacent nucleotides may be specificallylabeled. As described above a unit specific marker includes markerswhich are specific for individual nucleotides as well as markers whichare specific for multiple nucleotides. Multiple nucleotides include twoor more nucleotides which may or may not be adjacent. For instance if aunit specific marker is a complex of protein, the complex of proteinsmay interact with specific nucleotides that are adjacent to one anotheror which are separated by random nucleotides. This type of analysis isparticularly useful because detection of the signal requires lessresolution than with single nucleotide analysis. The more complex theanalysis, the greater resolution of the system. Resolution as usedherein refers to the number of nucleotides which can be resolved by theappropriate signal detection method used.

[0106] The signal detection method is described in more detail below.Preferably it includes methods such as nanochannel analysis, near-fieldscanning microscopy, atomic force microscopy, scanning electronmicroscopy, waveguide structures, etc.

[0107] The greater the number of nucleotides a unit specific markerspans and recognizes, the more amenable that unit specific marker is tolow resolution means of detection. For any given number ofnucleotide-spanning markers, the number of different unit specificmarkers which can be used is defined by the formula 4^(n), where n isthe number of nucleotides detected by the unit specific marker. A unitspecific marker which spans two nucleotides would be specific for one of16 combinations of nucleotide pairs. These include, AC, AG, AT, AA, CC,CA, CG, CT, GA, GG, GC, GT, TA, TC, TG, and TT. A unit specific markerwhich spans three nucleotides would be specific for one of a combinationof 64 three nucleotide pairs combinations. More than three nucleotidepairs combinations may also be used, and the number would increaseaccording to the above formula. Using these types of unit specificmarkers, nucleotide sequence information can be reconstructed through anumber of different means. The information generated from thereconstruction of the unit specific markers is not limited to thegeneration of sequence information, but additionally can be used tounambiguously identify fragments, provide the specific number of thatcombination of nucleotides found within the sequence, etc.

[0108] Various combinations of triplet unit specific markers bound to anucleic acid molecule can be deciphered and analyzed using thesemethods. Without knowing the precise location of the triplet unitspecific markers on the nucleic acid, the specificity given to a boundnucleic acid fragment is given as N/4^(n) where N is the number ofnucleotides in the fragment of target nucleic acid and n is the numberof bound sites on the nucleic acid. The longer the strand of nucleicacid, the lower the specificity of the particular system. Thespecificity of the bound unit specific markers can be increased bydetermining the precise location of the triplet unit specific markers.In this case, the specificity is increased to 1/4^(n) which is the sameas if an N-mer were bound to the target strand of nucleic acid.

[0109] The simplest method to determine the sequence of the nucleic acidmolecule from the set of triplet unit specific markers is to examine twotriplet 1 unit specific markers one time until all 64 unit specificmarkers are examined. If one of the triplet unit specific markers iskept constant during the analysis, the analysis is simplified. Anexample of this type of analysis is presented in FIG. 8. In the example,a short stretch of nucleic acid is analyzed using two triplet unitspecific markers. The triplet unit specific markers are CGX and GXX.Using these markers, the two based positions after the first ACG tripletcan be determined. Using the 63 different triplets together with theinitial fragment ACG, information about flanking nucleotides and thecontiguous sequence of the intervening nucleotides between the ACGs canbe determined.

[0110] Using these methods of sequence analysis, problems which occur inother types of hybridization, etc. analysis are avoided. For instance,repeated sequences such as the Alu repeats in the human genome createanalysis problems using hybridization sequencing methods. Such problemsare avoided using the methods described herein. Using the methodsdescribed herein the number of repeats can be simply counted by thedifferent triplets bound in each of the states. Hybridization sequencinganalysis does not allow the determination of linear order or number ofprobes found between two probe sequences. The linear order and theprecise quantitation of the number of probes bound allows an additionalorder of information which bypasses the difficulties faced in sequencingby hybridization. The methods of the invention are thus rapid andstraightforward.

[0111] The method using triplet, etc. unit specific markers does notneed to be performed sequentially. For instance, several triplets may beassayed simultaneously to provide an even more rapid method of analysis.The only limitation in simultaneous analysis is that none of the tripletunit specific markers used simultaneously should overlap one another.Therefore, the choice of one particular triplet sequence precludes thesimultaneous use of triplet sequences which would overlap with thatsequence. For example if the triplet sequence ACG is selected foranalysis, 4 of the 64 sets of triplets may not be used duringsimultaneous analysis with this triplet. These include XXA, XAC, GXX,and CGX. Mathematically, the maximum number of fragments which a tripletlabel can preclude simultaneous probing with is determined by thefollowing equation:

2[Σ4²+4¹] or generally 2[Σ4^(n−1)+4^(n−2) . . . 4¹]

[0112] where n is the number of nucleotides spanned by the labels. Thesum is that a maximum of 40 fragments are precluded from simultaneousassay with the originally selected ACG triplet. Therefore, a total of 24different fragments may be assayed at one time.

[0113] Double stranded nucleic acid analysis also may be accomplishedusing direction specific labels such as those shown in FIG. 9. Directionspecific labels allow for discrimination between a combination ofnucleotides such as ACG triplet on either strand. In the case ofdirection specific labels, the reversal of the center bound label showsthat it is a label bound on the opposite strand. The labels have 5′ to3′ or 3′ to 5′ directionality.

[0114] One use for the methods of the invention is to determine thesequence of units within a polymer. Identifying the sequence of units ofa polymer, such as a nucleic acid, is an important step in understandingthe function of the polymer and determining the role of the polymer in aphysiological environment such as a cell or tissue. The sequencingmethods currently in use are slow and cumbersome. The methods of theinvention are much quicker and generate significantly more sequence datain a very short period of time.

[0115] The analysis methods described herein may be linear or nonlinear. The methods for generating sequence information based on dataobtained from partially labeled polymers can be applied to data obtainedby any method that produces polymer dependent impulses. Thereconstruction of the sequence of the polymer from this type of data isan integral aspect of the invention. As long as the data is obtained bya method for detecting the polymer dependent impulses, whether it isobtained in a linear manner or not, the data may be analyzed accordingto the methods of the invention.

[0116] The signals may be detected sequentially or simultaneously. Asused herein signals are detected “sequentially” when signals fromdifferent unit specific markers of a single polymer are detected spacedapart in time. Not all unit specific markers need to be detected or needto generate a signal to detect signals “sequentially.” When the unitspecific markers are sequentially exposed to the station the unitspecific marker and the station move relative to one another. As usedherein the phrase “the unit specific marker and the station moverelative to one another” means that either the unit specific marker andthe station are both moving or only one of the two is moving and theother remains stationary at least during the period of time of theinteraction between the unit specific marker and the station. The unitspecific marker and the station may be moved relative to one another byany mechanism. For instance the station may remain stationary and thepolymer may be drawn past the station by an electric current. Othermethods for moving the polymer include but are not limited to movementresulting from a magnetic field, a mechanical force, a flowing liquidmedium, a pressure system, a gravitational force, and a molecular motorsuch as e.g., a DNA polymerase or a helicase when the polymer is DNA ore.g., myosin when the polymer is a peptide such as actin. The movementof the polymer may be assisted by the use of a channel, groove or ringto guide the polymer. Alternatively the station may be moved and thepolymer may remain stationary. For instance the station may be heldwithin a scanning tip that is guided along the length of the polymer.

[0117] In another embodiment signals are detected simultaneously. Asused herein signals are “detected simultaneously” by causing a pluralityof the labeled unit specific markers of a polymer to be exposed to astation at once. The plurality of the unit specific markers can beexposed to a station at one time by using multiple interaction sites.Signals can be detected at each of these sites simultaneously. Forinstance multiple stations may be localized at specific locations inspace which correspond to the unit specific markers of the polymer. Whenthe polymer is brought within interactive proximity of the multiplestations signals will be generated simultaneously. This may be embodied,for example, in a linear array of stations positioned at substantiallyequivalent distances which are equal to the distance between the unitspecific markers. The polymer may be positioned with respect to thestation such that each unit specific marker is in interactive proximityto a station to produce simultaneous signals.

[0118] Multiple polymers can be analyzed simultaneously by causing morethan one polymer to move relative to respective stations at one time.The polymers may be similar or distinct. If the polymers are similar,the same or different unit specific markers may be detectedsimultaneously.

[0119] A preferred method for moving a polymer past a station accordingto the invention utilizes an electric field. An electric field can beused to pull a polymer through a channel because the polymer becomesstretched and aligned in the direction of the applied field as haspreviously been demonstrated in several studies (Bustamante, 1991;Gurrieri et al., 1990; Matsumoto et al., 1981).

[0120] Another method for moving a polymer past a station involves theuse of a molecular motor. A molecular motor is a device which physicallyinteracts with the polymer and pulls the polymer past the station.Molecular motors include but are not limited to DNA and RNA polymerasesand helicases. DNA polymerases have been demonstrated to function asefficient molecular motors. Preferably the internal diameters of theregions of the polymerase which clamp onto the DNA is similar to that ofdouble stranded DNA. Furthermore, large amounts of DNA can be able to bethreaded through the clamp in a linear fashion. Molecular motors aredescribed in more detail in co-pending U.S. provisional patentapplication No. 60/096,540, filed Aug. 13, 1998 and the U.S. patentapplication claiming priority thereto, the entire contents of which ishereby incorporated by reference.

[0121] The overall structure of the b-subunit of DNA polymerase IIIholoenzyme is 80 Å in diameter with an internal diameter of ˜35 Å. Incomparison, a full turn of duplex B-form DNA is ˜34 Å. The beta subunitfits around the DNA, in a mechanism referred to as a sliding clampmechanism, to mediate the processive motion of the holoenzyme during DNAreplication. It is well understood that the b-subunit encircles DNAduring replication to confer processivity to the holoenzyme (Bloom etal., 1996; Fu et al., 1996; Griep, 1995; Herendeen and Kelly, 1996;Naktinis et al., 1996; Paz-Elizur et al., 1996; Skaliter et al., 1996).Because the sliding clamp is the mechanism of processivity for apolymerase, it necessarily means that large amounts of DNA are threadedthrough the clamp in a linear fashion. Several kilobases are threadedthrough the clamp at one time (Kornberg and Baker, 1991).

[0122] The detectable signal (polymer dependent impulse) is produced ata station. A “station” as used herein is a region where a portion of thepolymer to be detected, e.g. the unit specific marker, is exposed to, inorder to produce a signal or polymer dependent impulse. The station maybe composed of any material including a gas. Preferably the station is anon-liquid material. “Non-liquid” has its ordinary meaning in the art. Aliquid is a non-solid, non-gaseous material characterized by freemovement of its constituent molecules among themselves but without thetendency to separate. In another preferred embodiment the station is asolid material. In one embodiment when the interaction between the unitspecific marker and the station produces a polymer dependent impulse thestation is a signal generation station. One type of signal generationstation is an interaction station. As used herein an “interactionstation or site” is a region where a unit specific marker of the polymerinteracts with an agent and is positioned with respect to the agent inclose enough proximity whereby they can interact. The interactionstation for fluorophores, for example, is that region where they areclose enough so that they energetically interact to produce a signal.

[0123] The interaction station in one embodiment is a region of ananochannel where a localized agent, such as an acceptor fluorophore,attached to the wall forming the channel, can interact with a polymerpassing through the channel. The point where the polymer passes thelocalized region of agent is the interaction station. As each labeledunit specific marker of the polymer passes by the agent a detectablesignal is generated. The agent may be localized within the region of thechannel in a variety of ways. For instance the agent may be embedded inthe material that forms the wall of the channel or the agent may beattached to the surface of the wall material. Alternatively the agentmay be a light source which is positioned a distance from the channelbut which is capable of transporting light directly to a region of thechannel through a waveguide. An apparatus may also be used in whichmultiple polymers are transported through multiple channels. These andother related embodiments of the invention are discussed in more detailbelow. The movement of the polymer may be assisted by the use of agroove or ring to guide the polymer.

[0124] Other arrangements for creating interaction stations are embracedby the invention. For example, a polymer can be passed through amolecular motor tethered to the surface of a wall or embedded in a wall,thereby bringing unit specific markers of the polymer sequentially to aspecific location, preferably in interactive proximity to a proximateagent, thereby defining an interaction station. A molecular motor is abiological compound such as polymerase, helicase, or actin whichinteracts with the polymer and is transported along the length of thepolymer past each unit specific marker. Likewise, the polymer can beheld from movement and a reader can be moved along the polymer, thereader having attached to it the agent. For instance the agent may beheld within a scanning tip that is guided along the length of thepolymer. Interaction stations then are created as the agent is movedinto interactive proximity to each unit specific marker of the polymer.

[0125] The agent that interacts with the unit specific marker of thepolymer at the interaction station is selected from the group consistingof electromagnetic radiation, a quenching source, and a fluorescenceexcitation source. “Electromagnetic radiation” as used herein is energyproduced by electromagnetic waves. Electromagnetic radiation may be inthe form of a direct light source or it may be emitted by a lightemissive compound such as a donor fluorophore. “Light” as used hereinincludes electromagnetic energy of any wavelength including visible,infrared and ultraviolet.

[0126] As used herein, a quenching source is any entity which alters oris capable of altering a property of a light emitting source. Theproperty which is altered can include intensity fluorescence lifetime,spectra, fluorescence, or phosphorescence.

[0127] A fluorescence excitation source as used herein is any entitycapable of fluorescing or giving rise to photonic emissions (i.e.electromagnetic radiation, directed electric field, temperature,fluorescence, radiation, scintillation, physical contact, or mechanicaldisruption.) For instance, when the unit specific marker is labeled witha radioactive compound the radioactive emission causes molecularexcitation of an agent that is a scintillation layer which results influorescence.

[0128] When a unit specific marker of the polymer is exposed to theagent the interaction between the two produces a signal. The signalprovides information about the polymer. For instance if all unitspecific markers of a particular type, e.g., all of the alanines, of aprotein polymer are labeled (intrinsic or extrinsic) with a particularlight emissive compound then when a signal characteristic of that lightemissive compound is detected upon interaction with the agent the signalsignifies that an alanine residue is present at that particular locationon the polymer. If each type of unit specific marker e.g., each type ofamino acid is labeled with a different light emissive compound having adistinct light emissive pattern then each amino acid will interact withthe agent to produce a distinct signal. By determining what each signalfor each unit specific marker of the polymer is, the sequence of unitscan be determined.

[0129] The interaction between the unit specific marker and the agentcan take a variety of forms, but does not require that the unit specificmarker and the agent physically contact one another. Examples ofinteractions are as follows. A first type of interaction involves theagent being electromagnetic radiation and the unit specific marker ofthe polymer being a light emissive compound (either intrinsically orextrinsically labeled with a light emissive compound). When the lightemissive unit specific marker is contacted with electromagneticradiation (such as by a laser beam of a suitable wavelength orelectromagnetic radiation emitted from a donor fluorophore), theelectromagnetic radiation causes the light emissive compound to emitelectromagnetic radiation of a specific wavelength. The signal is thenmeasured. The signal exhibits a characteristic pattern of light emissionand thus indicates that a particular labeled unit specific marker of thepolymer is present. In this case the unit specific marker of the polymeris said to “detectably affect the emission of the electromagneticradiation from the light emissive compound.”

[0130] A second type of interaction involves the agent being afluorescence excitation source and the unit specific marker of thepolymer being a light emissive or a radioactive compound. When the lightemissive unit specific marker is contacted with the fluorescenceexcitation source, the fluorescence excitation source causes the lightemissive compound to emit electromagnetic radiation of a specificwavelength. When the radioactive unit specific marker is contacted withthe fluorescence excitation source, the nuclear radiation emitted fromthe unit specific marker causes the fluorescence excitation source toemit electromagnetic radiation of a specific wavelength. The signal thenis measured.

[0131] A variation of these types of interaction involves the presenceof a third element of the interaction, a proximate compound which isinvolved in generating the signal. For example, a unit specific markermay be labeled with a light emissive compound which is a donorfluorophore and a proximate compound can be an acceptor fluorophore. Ifthe light emissive compound is placed in an excited state and broughtproximate to the acceptor fluorophore, then energy transfer will occurbetween the donor and acceptor, generating a signal which can bedetected as a measure of the presence of the unit specific marker whichis light emissive. The light emissive compound can be placed in the“excited” state by exposing it to light (such as a laser beam) or byexposing it to a fluorescence excitation source.

[0132] Another interaction involves a proximate compound which is aquenching source. In this instance, the light emissive unit specificmarker is caused to emit electromagnetic radiation by exposing it tolight. If the light emissive compound is placed in proximity to aquenching source, then the signal from the light emissive unit specificmarker will be altered.

[0133] A set of interactions parallel to those described above can becreated wherein, however, the light emissive compound is the proximatecompound and the unit specific marker is either a quenching source or anacceptor source. In these instances the agent is electromagneticradiation emitted by the proximate compound, and the signal isgenerated, characteristic of the interaction between the unit specificmarker and such radiation, by bringing the unit specific marker ininteractive proximity with the proximate compound.

[0134] The mechanisms by which each of these interactions produces adetectable signal is known in the art. For exemplary purposes themechanism by which a donor and acceptor fluorophore interact accordingto the invention to produce a detectable signal including practicallimitations which are known to result from this type of interaction andmethods of reducing or eliminating such limitations is set forth below.

[0135] Another preferred method of analysis of the invention involvesthe use of radioactively labeled polymers. The type of radioactiveemission influences the type of detection device used. In general, thereare three different types of nuclear emission including alpha, beta, andgamma radiation. Alpha emission cause extensive ionization in matter andpermit individual counting by ionization chambers and proportionalcounters, but more interestingly, alpha emission interacting with mattermay also cause molecular excitation, which can result in fluorescence.The fluorescence is referred to as scintillation. Beta decay which isweaker than alpha decay can be amplified to generate an adequate signal.Gamma radiation arises from internal conversion of excitation energy.Scintillation counting of gamma rays is efficient and produces a strongsignal. Sodium iodide crystals fluoresce with incident gamma radiation.

[0136] A “scintillation” layer or material as used herein is any type ofmaterial which fluoresces or emits light in response to excitation bynuclear radiation. Scintillation materials are well known in the art.Aromatic hydrocarbons which have resonance structures are excellentscintillator. Anthracene and stilbene fall into the category of suchcompounds. Inorganic crystals are also known to fluoresce. In order forthese compounds to luminesce, the inorganic crystals must have smallamounts of impurities, which create energy levels between valence andconduction bands. Excitation and de-excitation can therefore occur. Inmany cases, the de-excitation can occur through phosphorescent photonemission, leading to a long lifetime of detection. Some commonscintillator include NaI (Tl), ZnS (Ag), anthracene, stilbene, andplastic phosphors.

[0137] Many methods of measuring nuclear radiation are known in the artand include devices such as cloud and bubble chamber devices, constantcurrent ion chambers, pulse counters, gas counters (i.e., Geiger-Müllercounters), solid state detectors (surface barrier detectors,lithium-drifted detectors, intrinsic germanium detectors), scintillationcounters, Cerenkov detectors, etc.

[0138] Analysis of the radiolabeled polymers is identical to other meansof generating polymer dependent impulses. For example, a sample withradiolabeled A's can be analyzed by the system to determine relativespacing of A's on a sample DNA. The time between detection of radiationsignals is characteristic of the polymer analyzed. Analysis of fourpopulations of labeled DNA (A's, C's, G's, T's) can yield the sequenceof the polymer analyzed. The sequence of DNA can also be analyzed with amore complex scheme including analysis of a combination of dual labeledDNA and singly labeled DNA. Analysis of a and C labeled fragmentfollowed by analysis of an A labeled version of the same fragment yieldsknowledge of the positions of the A's and C's. The sequence is known ifthe procedure is repeated for the complementary strand. The system canfurther be used for analysis of polymer (polypeptide, RNA,carbohydrates, etc.), size, concentration, type, identity, presence,sequence and number.

[0139] The methods described above can be performed on a single polymeror on more than one polymer in order to determine structural informationabout the polymer.

[0140] In another preferred embodiment the signal generated by theinteraction between the unit specific marker and the agent results fromfluorescence resonance energy transfer (FRET) between fluorophores.Either the unit specific marker or the proximate compound/agent may belabeled with either the donor or acceptor fluorophore. FRET is thetransfer of photonic energy between fluorophores. FRET has promise as atool in characterizing molecular detail because of its ability tomeasure distances between two points separated by 10 Å to 100 Å. Theangstrom resolution of FRET has been used in many studies of moleculardynamics and biophysical phenomena (for reviews see Clegg, 1995; Clegg,1992; Selvin, 1995; and Wu and Brand, 1994). The resolving power of FRETarises because energy transfer between donor and acceptor fluorophoresis dependent on the inverse sixth power of the distance between theprobes. In practice, this resolution is about an order of magnitudebetter than that of the highest resolution electron microscope.

[0141] In order to undergo FRET, the emission spectrum of the donoroverlaps with the excitation spectrum of the acceptor. The unit specificmarker of the polymer is specifically labeled with an acceptorfluorophore. The agent is a donor fluorophore. A laser is tuned to theexcitation wavelength of the donor fluorophore. As the polymer is movedthrough the channel, the donor fluorophore emits its characteristicwavelength. As the acceptor fluorophore moves into interactive proximitywith the donor fluorophore, the acceptor fluorophore is excited by theenergy from the donor fluorophore. The consequence of this interactionis that the emission of the donor fluorophore is quenched and that ofthe acceptor fluorophore is enhanced.

[0142] In order to generate an optimal efficient FRET signal fordetection, two conditions should be satisfied. The first condition isefficient donor emission in the absence of acceptors. The second isefficient generation of a change in either donor or acceptor emissionsduring FRET. Each of these are described in more detail in co-pendingPCT Patent Application PCT/US98/03024 and U.S. Ser. No. 09/134,411.

[0143] A “detectable signal” as used herein is any type of signal orpolymer dependent impulse which can be sensed by conventionaltechnology. The signal produced depends on the type of station as wellas the unit specific marker and the proximate compound if present. Inone embodiment the signal is electromagnetic radiation resulting fromlight emission by a labeled (intrinsic or extrinsic) unit specificmarker of the polymer or by the proximate compound. In anotherembodiment the signal is fluorescence resulting from an interaction of aradioactive emission with a scintillation layer. The detected signalsmay be stored in a database for analysis. One method for analyzing thestored signals is by comparing the stored signals to a pattern ofsignals from another polymer to determine the relatedness of the twopolymers. Another method for analysis of the detected signals is bycomparing the detected signals to a known pattern of signalscharacteristic of a known polymer to determine the relatedness of thepolymer being analyzed to the known polymer. Comparison of signals isdiscussed in more detail below.

[0144] More than one detectable signal may be detected. For instance afirst individual unit specific marker may interact with the agent orstation to produce a first detectable signal and a second individualunit specific marker may interact with the agent or station to produce asecond detectable signal different from the first detectable signal.This enables more than one type of unit specific marker to be detectedon a single polymer.

[0145] Once the signal is generated it can then be detected. Theparticular type of detection means will depend on the type of signalgenerated which of course will depend on the type of interaction whichoccurs between the unit specific marker and the agent. Many interactionsinvolved in the method of the invention will produce an electromagneticradiation signal. Many methods are known in the art for detectingelectromagnetic radiation signals, including two- and three-dimensionalimaging systems. These and other systems are described in more detail inco-pending PCT Patent Application PCT/US98/03024 and U.S. Ser. No.09/134,411.

[0146] Other interactions involved in the method will produce a nuclearradiation signal. As a radiolabel on a polymer passes through thedefined region of detection, such as the station, nuclear radiation isemitted, some of which will pass through the defined region of radiationdetection. A detector of nuclear radiation is placed in proximity of thedefined region of radiation detection to capture emitted radiationsignals. Many methods of measuring nuclear radiation are known in theart including cloud and bubble chamber devices, constant current ionchambers, pulse counters, gas counters (i.e., Geiger-Müller counters),solid state detectors (surface barrier detectors, lithium-drifteddetectors, intrinsic germanium detectors), scintillation counters,Cerenkov detectors, etc.

[0147] Other types of signals generated are well known in the art andhave many detections means which are known to those of skill in the art.Among these include opposing electrodes, magnetic resonance, andpiezoelectric scanning tips. Opposing nanoelectrodes can function bymeasurement of capacitance changes. Two opposing electrodes create anarea of energy storage, which is effectively between the two electrodes.It is known that the capacitance of two opposing electrodes change whendifferent materials are placed between the electrodes. This value isknown as a dielectric constant. Changes in the dielectric constant canbe measured as a change in the voltage across the two electrodes. In thepresent example, different nucleotide bases or unit specific markers ofa polymer may give rise to different dielectric constants. Thecapacitance changes as the dielectric constant of the unit specificmarker of the polymer per the equation: C=KC_(o), where K is thedielectric constant and C_(o) is the capacitance in the absence of anybases. The voltage deflection of the nanoelectrodes is then outputted toa measuring device, recording changes in the signal with time.

[0148] A nanosized NMR detection device can be constructed to detect thepassage of specific spin-labeled polymer unit specific markers. Thenanosized NMR detection device consists of magnets which can be sweptand a means of irradiating the polymer with electromagnetic energy of aconstant frequency. (This is identical to holding the magnetic fieldconstant while the electromagnetic frequency is swept). When themagnetic field reaches the correct strength, the nuclei absorb energyand resonance occurs. This absorption causes a tiny electric current toflow in an antenna coil surrounding the sample. The signal is amplifiedand output to a recording device. For known labeled compounds, the timeof detection is much faster than current means of NMR detection where afull spectra of the compound in question is required. Known labeled unitspecific markers of polymers have known chemical shifts in particularregions, thereby eliminating the need to perform full spectral sweeps,lowering the time of detection per base to micro or milliseconds.

[0149] A nanoscale piezoelectric scanning tip can be used to read thedifferent unit specific markers of the polymer based on physical contactof the different polymer unit specific markers with the tip. Dependingon the size and shape of the polymer unit specific marker, differentpiezoelectric signals are generated, creating a series of unit specificmarker dependent changes. Labels on unit specific markers are physicallydifferent than native units and can create a ready means for detectionvia a piezoelectric scanning tip. Upon contact of a polymer unitspecific marker with the tip, the piezoelectric crystals change and giverise to a current which is outputted to a detection device. Theamplitude and duration of the current created by the interaction of thepolymer unit specific marker and the tip is characteristic of thepolymer unit specific marker.

[0150] In one preferred type of linear analysis, the labeled polymer isfixed in a relative position to a station by a nanochannel, such that asthe labeled polymer passes the station signals arising from theinteraction between the station and the labeled polymer are spaciallyconfined. The channels preferably correspond to the diameter of thelabeled polymer and fix the DNA relative to an imaging system which isable to capture many emissions from the labeled polymer over anintegrated period of time. The method is specific for the analysis ofintensities of individual molecules. The nanochannel system is providedas an example and is discussed in more detail below. Any means can beused to fix the labeled polymers in a dimension for analysis by anoptical method capable of analyzing the signals over time. Examples ofdevices which are capable of positioning labeled polymers for analysisinclude nanochannel arrays, integrated nanofabricated waveguides, andvarious lattices.

[0151] The methods of the invention are described herewith reference toseveral examples using polymers which are fluorescently labeled andwhich are analyzed using a nanochannel device to simplify thediscussion. The invention, however, is not limited to these examples andthe other embodiments will be described more fully herein. In theexample, fluorescently labeled polymers are drawn through a series ofnanochannels. The planar surface of the nanochannels is illuminated viaepiillumination. As the polymers cross the nanochannels, they are fixedin position relative to the imaging device, allowing for integration ofthe polymers fluorescence over time. The corresponding brightness of thefluorescence spots is indicative of the approximate length of thelabeled polymers. The rationale behind the use of the restrictivestructures is to fix the molecules relative to the detection system sothat photon collection can occur in a fixed spatial dimension over time.In the embodiments when intensity is not being measured there is no needto fix the polymer relative to the imaging device. The time period thatthe polymer is fixed may be very short, and is just enough to make anintensity measurement.

[0152] As mentioned above, the signals arising from the polymers providestructural information about the polymer. The types of structuralinformation that can be obtained are dependent on how the polymer islabeled and includes, for instance, all of the information describedabove as “unit specific information.” The presence or absence of aparticular sequence can be established by determining whether anypolymers within the sample express a characteristic pattern ofindividual units which is only found in the polymer of interest i.e., bycomparing the detected signals to a known pattern of signalscharacteristic of a known polymer to determine the relatedness of thepolymer being analyzed to the known polymer. The entire sequence of thepolymer of interest does not need to be determined in order to establishthe presence or absence of the polymer in the sample. Similarly themethods may be useful for comparing the signals detected from onepolymer to a pattern of signals from another polymer to determine therelatedness of the two polymers.

[0153] In one example the structural information provides the length ofthe polymer. The lengths of polymers are most easily determined fromlabeling all of the units of the polymer. As the polymers migrate pasttheir respective restrictive nanostructures in a linear fashion, thephoton flux from each of the restricted polymers is determined. Anexample using a nucleic acid is illustrated in FIGS. 1-11 wherein thenucleic acid is drawn through small channels in a membrane in a linearfashion by electrophoresis. The nucleic acid is labeled throughout thestrand. The nucleic acid is excited by fluorescence epiillumination. Theblackened membrane does not permit excitation of molecules below themembrane. As the fluorescently labeled nucleic acid molecules migratepast the nonochannels, they are fixed spatially. Integration of thecollected signal gives rise to discrete, spatially separated intensityspots on the fluorescence image. The intensity of the spots are directlyproportional to the length of the nucleic acid molecules. Afterelectrophoretic migration past the channels, the nucleic acid is drawnout of the imaging area by a displaced electrode. Experimental detailsare provided in the Examples below.

[0154] The determination of the length of the polymer can be performedon a single polymer if each unit of the polymer is labeled. If most ofthe units of a plurality of polymers in a sample are labeled on eachunit and the method is performed on a plurality of identical polymersthen by comparing the intensities of the entire population the lengthcan be determined.

[0155] In another example the structural information provides the numberof polymers passing through a restrictive nanostructure in a givenamount of time. The number of polymers can be determined by labelingeach polymer with a fixed number of fluorophores. This can beaccomplished for example by phosphate end-labeling the polymers usingkits available from Molecular Probes. The brightness of the intensityspots in the analysis should be in fixed increments of two labels. Arestrictive nanostructure with four polymers passing through it in atime (t) would give rise to an intensity spot of eight fluorophores.

[0156] In yet another example the structural information provides unitspecific information about the polymer. In an example ssDNA are labeledwith peptide nucleic acids (PNAs) which bind with high T_(m) to thedesired and specific sequences. The PNAs are cross-linked to thesingle-stranded DNA through uv crosslinking and the formation of PNA-DNAcovalent complexes. The labeled PNAs are detected in the restrictivenanostructures. Based on the brightness of the individual wavelengthscorresponding to the different probes used, information about thecomposition of internal sequences of a strand in question can bedetermined.

[0157] An example of a restrictive structure which enables measurementof intensity signals at a fixed station is a nanostructure fabricated bye-beam lithography and reactive ion etching as shown in FIG. 12. Highresolution e-beam lithography is capable of fabricating structures withgap sizes on the order to 10 nm. These small structures can be used torestrict the movement of DNA. The structure shown in FIG. 12 isnanofabricated from quartz by e-beam lithography and reactive ionetching. Preferably it is a waveguide with a metallic surface.Multicolor excitation at each of the waveguides is initiated to excitethe individual DNA molecules. The surfaces of the substrate are cleanedby an oxygen plasma cleaner. The hydrophillic surfaces are wetted andthe top of the structures are sealed with a coverslip spin coated withsilicone (Newark Electronics, NJ). The DNA molecules are migratedthrough the structures electrophoretically. Intensity images of themolecules are obtained. The total photon count per molecule isproportional to the length of the molecule.

[0158] Another example of a restrictive nanostructure which may be usedis a monolayer of hexagonally packed beads which has the configurationshown in FIG. 13. The dimensions of the geometric areas between thebeads can be easily calculated.

[0159] The methods of the invention can be used to identify one, some,or many of the units of the polymer. This is achieved by identifying thetype of individual unit and its position on the backbone of the polymerby determining the intensity of a signal arising from the labeledpolymer using labeled sequence specific probes as described in greaterdetail above.

[0160] In general the methods of linear polymer analysis using intensityare performed by detecting optical signals. An “optical signal” as usedherein differs from a polymer dependent impulse and is a detectableelectromagnetic radiation signal which transmits or conveys informationabout the structural characteristics of a polymer and from which aquantitative analysis of intensity can be determined. The optical signalmay arise from energy transfer, quenching, radioactivity, or any otherphysical changes from which a quantitative measure of intensity can bederived. The signal preferably is optically detected. An “opticallydetectable” signal as used herein is a light based signal in the form ofelectromagnetic radiation which can be detected by light detectingimaging systems.

[0161] Optical detectable signals are generated, detected and stored ina database the signals can be analyzed to determine structuralinformation about the polymer. The signals can be analyzed by assessingthe intensity of the signal to determine structural information aboutthe polymer. The computer may be the same computer used to collect dataabout the polymers, or may be a separate computer dedicated to dataanalysis. A suitable computer system to implement the present inventiontypically includes an output device which displays information to auser, a main unit connected to the output device and an input devicewhich receives input from a user. The main unit generally includes aprocessor connected to a memory system via an interconnection mechanism.The input device and output device also are connected to the processorand memory system via the interconnection mechanism. Computer programsfor data analysis of the detected signals are readily available from CCDmanufacturers.

[0162] The methods of the invention can be accomplished using any devicewhich produces a specific detectable polymer dependent impulse for anindividual unit specific marker of a polymer. One type of device whichenables this type of analysis is one which promotes linear transfer of apolymer past an interaction station or a signal generation station, suchas, an article of manufacture including a wall material having a surfacedefining a channel, an agent selected from the group consisting of anelectromagnetic radiation source, a quenching source, a luminescent filmlayer, and a fluorescence excitation source, attached to the wallmaterial adjacent to the channel. Preferably the agent is close enoughto the channel and is present in an amount sufficient to detectablyinteract with a partner compound selected from the group consisting of alight emissive compound and a quencher passing through the channel.

[0163] A wall material is a solid or semi-solid barrier of anydimensions which is capable of supporting at least one channel. Asemi-solid material is a self supporting material and may be forinstance a gel material such as a polyacrylamide gel. For instance thewall material may be composed of a single support material which may beconducting or non-conducting, light permeable or light impermeable,clear or unclear. In some instances the agent is embedded within thewall material. In these instances the wall material can be solely orpartially made of a non-conducting layer, a light permeable layer or aclear layer to allow the agent to be exposed to the channel formed inthe wall material to allow signal generation. When the wall material isonly partially made from these materials the remaining wall material maybe made from a conducting, light impermeable or unclear layer, whichprevent signal generation. In some cases the wall material is made up oflayers of different materials. For instance, the wall material may bemade of a single conducting layer and a single non-conducting layer.Alternatively the wall material may be made of a single non-conductinglayer surrounded by two conducing layers. Multiple layers and variouscombinations of materials are encompassed by the wall material of theinvention.

[0164] As used herein a “luminescent film layer” is a film which isnaturally luminescent or made luminescent by some means of excitation orillumination, e.g., electrooptic thin films and high index filmsilluminated by internal reflection.

[0165] As used herein a “material shield” is any material which preventsor limits energy transfer or quenching. Such materials include but arenot limited to conductive materials, high index materials, and lightimpermeable materials. In a preferred embodiment the material shield isa conductive material shield. As used herein a “conductive materialshield” is a material which is at least conductive enough to preventenergy transfer between donor and acceptor sources.

[0166] A “conductive material” as used herein is a material which is atleast conductive enough to prevent energy transfer between a donor andan acceptor.

[0167] A “nonconductive material” as used herein is a material whichconducts less than that amount that would allow energy transfer betweena donor and an acceptor.

[0168] A “light permeable material” as used herein is a material whichis permeable to light of a wavelength produced by the specificelectromagnetic radiation, quenching source, or the fluorescenceexcitation source being used.

[0169] A “light impermeable material” as used herein is a material whichis impermeable to light of a wavelength produced by the specificelectromagnetic radiation, quenching source, or the fluorescenceexcitation source being used.

[0170] A “channel” as used herein is a passageway through a mediumthrough which a polymer can pass. The channel can have any dimensions aslong as a polymer is capable of passing through it. For instance thechannel may be an unbranched straight cylindrical channel or it may be abranched network of interconnected winding channels. Preferably thechannel is a straight nanochannel or a microchannel. A “nanochannel” asused herein is a channel having dimensions on the order of nanometers.The average diameter of a nanochannel is between 1 nm and 999 nm. A“microchannel” as used herein is a channel having dimensions on theorder of micrometers. The average diameter of a microchannel is between1 mm and 1 mm. Preferred specifications and dimensions of channelsuseful according to the invention are set forth in detail below. In apreferred embodiment, the channel is fixed in the wall.

[0171] An agent is attached to the wall material in such a manner thatit will detectably interact with a partner compound by undergoing energytransfer or quenching with the partner light emissive compound which ispassing through the channel of the wall material. In order to interactwith the partner compound the agent can be positioned in close proximityto the channel. For example, the agent may be attached to the inside ofthe channel, attached to the external surface of the wall material,attached to a concentrated region of the external surface of the wallmaterial surrounding the rim of the channel, embedded within the wallmaterial, or embedded in the form of a concentric ring in the wallmaterial surrounding the channel. Optionally the agent may cover theentire surface of the wall material or may be embedded throughout theentire wall material. In order to improve signal generation when theagent is not localized, a mask may be used to cover some areas of thewall material such that only localized regions of agent are exposed. A“mask” as used herein is an object which has openings of any size orshape. More than one agent may be attached to the wall material in orderto produce different signals when the agents are exposed to the partneragent.

[0172] The agent may be attached to the surface of the wall material byany means of performing attachment known in the art. Examples of methodsfor conjugating biomaterials are presented in Hermanson, G. T.,Bioconjugate Techniques, Academic Press, Inc., San Diego, 1996, which ishereby incorporated by reference.

[0173] When the agent is attached to the surface of the wall material itmay be attached directly to the wall material or it may be attached viaa linker. A “linker” as used herein with respect to the attachment ofthe agent is a molecule that tethers a light emitting compound or aquenching compound to the wall material. Linkers are well known in theart. Commonly used linkers include alkanes of various lengths.

[0174] The agent is attached to the wall material in an amountsufficient to detectably interact with a partner light emissivecompound. As used herein a “partner light emissive compound” is a lightemissive compound as defined above but which specifically interacts withand undergoes energy transfer or quenching when positioned in closeproximity to the agent. The amount of partner light emissive compoundand the amount of agent required will depend on the type of agent andlight emissive compound used.

[0175] Another example of an article of manufacture which is useful forpracticing the method of the invention is a wall material having asurface defining a plurality of channels and a station attached to adiscrete region of the wall material adjacent to at least one of thechannels, wherein the station is close enough to the channel and ispresent in an amount sufficient to cause a signal to arise from adetectable physical change in a polymer of linked unit specific markerspassing through the channel or in the station as the polymer is exposedto the station. A “discrete region” of the wall material adjacent to atleast one of the channels is a local area which is surrounded by wallmaterial not having a station. The area surrounding the discrete regiondoes not interact with the unit specific marker to produce the samecharacteristic signal produced by the interaction of the unit specificmarker with the station. The discrete region is positioned in or nearthe channel such that the station at the discreet region is exposed tothe unit specific marker as it traverses the channel.

[0176] An additional article of manufacture useful for practicing themethod of the invention is a wall material having a surface defining achannel and a plurality of stations each attached to a discrete regionof the wall material adjacent to the channel, wherein the stations areclose enough to the channel and are present in an amount sufficient tocause a signal to arise from a detectable physical change in a polymerof linked unit specific markers passing through the channel or in thestation as the polymer is exposed to the station.

[0177] As used herein a “plurality of stations” is at least twostations. Preferably a plurality of stations is at least three stations.In another preferred embodiment a plurality of stations is at least fivestations.

[0178] PCT Patent Application PCT/US98/03024 provides a detaileddescription of an optimal design of a nanochannel plate havingfluorophores embedded within the plate as well as other articles usefulfor practicing the methods of the invention. The methods of theinvention are not limited, however, to the use of articles ofmanufacture described herein or in the priority PCT application. Theexamples are provided for illustrative purposes only. The methods of theinvention can be performed using any system in which a plurality of unitspecific markers of a polymer can be moved with respect to a fixedstation and from which signals can be obtained.

[0179] A preferred method of the invention involves the analysis ofradiolabeled polymers as discussed above. Preparation of radiolabeledpolymers such as DNA (for example, with ³²P or ³H) is known in the art.The following description represents one of the many possibleembodiments of analyzing radiolabeled polymers according to the methodsof the invention. A radiolabeled nucleic acid molecule is analyzed witha single fabricated multilayered nanochannel. The nanochannel is thediameter of the labeled nucleic acid molecule and is constructed toyield a defined region of detection. Exemplary nanochannel platesinclude a heterogeneous multilayered structure consisting of tworadiation impermeable layers such as lead or Lucite films and a film oflower density material (or scintillation layer) (i.e., conventionalpolymers, polymethylmethacrylate, polystyrene, Teflon, etc.). The leadfilms sandwich the layer of lower density material and are of suchthickness as to prevent passage of radiation. The lower density materialpermits passage of radiation, thereby creating a defined region ofradiation detection. As the radiolabel on the DNA passes through thedefined region of detection, nuclear radiation is emitted, some of whichwill pass through the defined region of radiation detection.

[0180] In a related embodiment of analysis of radiolabeled nucleotides adetection system based on scintillation counting and multiplenanochannels is presented. A nanochannel array is fabricated. Multiplemultilayered channels exist for parallel amplification of data output.Each individual channel consists of two nuclear radiation shieldinglayers which shield nuclear radiation, and a scintillation layer whichis fluorescently excited with exposure to nuclear radiation. Theindividual channels are separated from each other by a nuclear radiationshielding material. The nuclear radiation is prevented from reaching thefluorescent detection system by overlaying with optical quality Lucite(or any other transparent material which prevents the passage of nuclearradiation). This allows only the fluorescent signal to reach thedetection system.

[0181] Each of the above described nanochannels is only an example. Itis, therefore, anticipated that each of the limitations described withrespect to these embodiments involving any one element or combinationsof elements can be included in each nanochannel. Preparation of filmshaving multiple layers of differing material have been described in theart, e.g., U.S. Pat. No. 5,462,467, Ferreira et. al., Thin Solid Films244:806-809 (1994).

[0182] In one embodiment the signal station is fixed. A station is“fixed” as used herein if the station and the detection device do notmove. The polymer may move past the fixed station but the station doesnot move. In one embodiment the station is an interaction station. Asused herein an “interaction station or site” is a region where a unitspecific marker of the polymer interacts with an agent and is positionedwith respect to the agent in close enough proximity whereby they caninteract. The interaction station for fluorophores, for example, is thatregion where they are close enough so that they energetically interactto produce a signal.

[0183] Methods for preparing the wall material and the various lightconductive and non-conductive layers etc are described in co-pending PCTPatent Application PCT/US98/03024.

[0184] An example of an apparatus constructed to hold a nanochannel (ormicrochannel) plate which is capable of generating an electric field isdescribed The electric field, created by electrodes, is used to draw theDNA through the nanochannels. The exemplary nanochannel plate isimmersed in a slightly viscous buffer solution which helps to slow thetransit of the polymer through the nanochannel, so as to allow for alonger signal collection time per base. In addition, on either side ofthe plate are electrodes immersed in the buffer solution. The ensembleof nanochannel plate, buffer compartments, and electrodes are containedin an optical quality glass chamber suitable for image analysis and arepositioned adjacent to a 60×1.4 NA oil objective.

[0185] As discussed above the use of an electric field to cause thepolymer to move linearly through a channel is preferred. The use of anelectric field is suitable because the stretched, linear orientation ofa polymer in an electric field is favorable for linear crossing ofnanochannels. Furthermore, the rate of polymer movement can becontrolled by voltage. Lastly, an electric field does not adverselyaffect FRET.

[0186] Light microscopy (Bustamante, 1991; Gurrieri et al., 1990;Matsumoto et al., 1981, Rampino and Chrambach, 1990; Schwartz and Koval,1989; Smith et al., 1989), linear dichroism (LD) (Åkerman et al., 1990;Åkerman et al., 1985, Moore et al., 1986), fluorescence-detected LD(Holzwarth et al., 1987), and linear birefringence (Sturm and Weill,1989; Chu et al., 1990) can be used to study the instantaneous changesin shape of DNA molecules undergoing gel electrophoresis. In thesestudies DNA is shown to be strongly oriented and stretched.

[0187] Guirrieri et al., 1990 has demonstrated linear and stretchedconformation of DNA molecules in an electric field. In each of thecases, the DNA molecule is clearly aligned in the direction of theapplied electric field. The method used to visualize the DNA moleculescombines fluorescent DNA labeling and use of an image intensifier-videocamera system (Bustamante, 1991; Houseal et al., 1989; Morikawa andYanagida, 1981; Matsumoto et al., 1989; Yanagida et al., 1983). The DNAmolecules shown are T2 molecules which are 164 kbp long.

[0188] The orientation of DNA in an electric field has been well studiedwith linear dichroism and electric dichroism (Ding et al., 1972; Yamaokaand Charney, 1973; Colson et al., 1974; Hogan et al., 1978; Priore andAllen, 1979; Yamaoka and Matsuda, 1981; Wu et al., 1981). In fact, thefirst studies done on DNA orientation have been performed with these twotechniques. DNA was first studied in solution and then subsequently inelectrophoretic gels. Studies both in solution and in gels yield similarresults in that the DNA molecules are indeed oriented in the directionof the electric field.

[0189] The invention in another aspect is a kit for labeling polymers.The kit includes a container housing a series of distinct nucleic acidprobes; wherein the series of nucleic acid probes is a set of multiplebase pair probes. Preferably the multiple base pair probes are selectedfrom the group consisting of two base pair probes, three base pairprobes, four base pair probes, and five base pair probes, but may be anynumber of base pairs greater than one.

[0190] A “base pair probe” as used herein refers to a single strandednucleic acid fragment that can be used to hybridize to an unknownsequence. The term is used consistently with it's common meaning in theart. A “set of multiple base pair probes” as used herein is a series ofbase pair probes of a specific length, wherein the series of probesincludes each possible combination of nucleic acid sequences of thatparticular length. For any given number of nucleic acid sequences, thenumber of different base pair probes which can be used is defined by theformula 4^(n), where n is the number of nucleic acid sequences. Forinstance a two base pair probe, which spans two nucleotides, wouldinclude 16 combinations of nucleotide pairs. These include, AC, AG, AT,AA, CC, CA, CG, CT, GA, GG, GC, GT, TA, TC, TG, and TT. A three basepair probe, which spans three nucleotides, would include a combinationof 54 three nucleotide pairs combinations. Those of ordinary skill inthe art would easily be able to identify each of the differentcombinations.

[0191] In one embodiment the container is a single container having aplurality of compartments, each housing a specific labeled probe. Inanother embodiment the container is a plurality of containers. Anexample of a kit is presented in FIG. 10.

[0192] The kit in one embodiment also includes instructions for labelingthe nucleic acid probes, if the probes are not already labeled.

[0193] The distinct nucleic acid probes are labeled in one embodiment.Preferably the nucleic acid probes are labeled with an agent selectedfrom the group consisting of an electromagnetic radiation source, aquenching source and a fluorescence excitation source. In one embodimentthe plurality of polymers is a homogenous population. In anotherembodiment the distinct nucleic acid probes are three base pair probes.In another embodiment the distinct nucleic acid probes are four basepair probes. In yet another embodiment the distinct nucleic acid probesare five base pair probes.

EXAMPLES Example 1

[0194] The DNA solution is prepared by mixing together with TOTO-1obtained from Molecular Probes, OR. TOTO-1 is a dimeric cyanine dyewhich is virtually nonfluorescent in the absence of binding to dsDNA.Upon binding the fluorescence is increased over 1000-fold. Excitation ofthe dye is at 514 nm and emission is at 533. Nanomolar concentrations ofthe dye is used in 0.5×TBE buffer. The polycarbonate membrane areblackened by dissolving 2 gm of Irgalan black (Chemical Index, acidblack 107) in 1 L of 2% acetic acid. Membranes are soaked in Irgalanblack solution for 24 hours. The samples are rinsed with water and driedin air. The membrane is mounted in a dual buffer compartment which hasan optical window for collecting fluorescence images. An electric fieldbetween 1 V/cm to 20 V/cm is used to electrophoretically drive the DNAthrough the nanochannels which can range in diameter of 5 nm to 10 nm.DNA crossing the nanochannels non-linearly would give aberrant intensitysignals which, upon statistical average, would give rise to noise.

[0195] The foregoing written specification is considered to besufficient to enable one skilled in the art to practice the invention.The present invention is not to be limited in scope by examplesprovided, since the examples are intended as a single illustration ofone aspect of the invention and other functionally equivalentembodiments are within the scope of the invention. Various modificationsof the invention in addition to those shown and described herein willbecome apparent to those skilled in the art from the foregoingdescription and fall within the scope of the appended claims. Theadvantages and objects of the invention are not necessarily encompassedby each embodiment of the invention.

[0196] All references, patents and patent publications that are recitedin this application are incorporated in their entirety herein byreference.

1 3 1 9 DNA Artificial Sequence Synthetic sequence 1 aggcaaacg 9 2 10DNA Artificial Sequence Synthetic sequence 2 agtgcaaacg 10 3 16 DNAArtificial Sequence Synthetic sequence 3 acgtacgtac gtacgt 16

I claim:
 1. A method for determining an order of at least two labeledunit specific markers of a polymer comprising, obtaining polymerdependent impulses for at least two labeled unit specific markers of aplurality of polymers, comparing the polymer dependent impulses obtainedfrom each of the plurality of polymers, determining an order of the atleast two labeled unit specific markers from the polymers based uponcomparing the polymer dependent impulses.
 2. The method of claim 1,wherein the plurality of polymers is a homogenous population.
 3. Themethod of claim 1, wherein the polymer dependent impulses arise fromunit specific markers of less than all units of the polymers.
 4. Themethod of claim 1, wherein the polymers are randomly labeled.
 5. Themethod of claim 3, wherein the polymer dependent impulses arise from atleast two unit specific markers of the polymers.
 6. The method of claims1, wherein the polymer is a nucleic acid.
 7. The method of claim 6,wherein the obtained polymer dependent impulses includes the time ofseparation between unit specific markers.
 8. The method of claim 6,wherein the unit specific markers are nucleic acid probes.
 9. The methodof claim 8, wherein the unit specific markers are two base pair nucleicacid probes.
 10. The method of claim 8 wherein the unit specific markersare three base pair nucleic acid probes.
 11. The method of claim 6,wherein the unit specific markers are peptide nucleic acid probes. 12.The method of claim 6, wherein the obtained polymer dependent impulsesindicate the sequence of units of the polymer.
 13. The method of claim1, wherein a portion of the unit specific markers are unknown.
 14. Themethod of claim 1, wherein each polymer is analyzed separately.
 15. Amethod for sequencing a polymer of linked units comprising, obtainingpolymer dependent impulses from a plurality of overlapping polymers, atleast a portion of each of the polymers having a sequence of linkedunits identical to the other of the polymers, and comparing the polymerdependent impulses from an overlapping portion of each of the pluralityof polymers to obtain a sequence of linked units which is identical inthe plurality of polymers.
 16. The method of claim 15, wherein thepolymer dependent impulses are optically detectable.
 17. The method ofclaim 15, wherein the plurality of polymers is a homogeneous population.18. The method of claim 15, wherein the plurality of polymers is aheterogeneous population.
 19. The method of claim 15, wherein theplurality of polymers is randomly labeled.
 20. The method of claims 15,wherein the polymers are nucleic acids.
 21. The method of claim 20,wherein the nucleic acids are labeled with an agent selected from thegroup consisting of an electromagnetic radiation source, a quenchingsource and a fluorescence excitation source.
 22. The method of claim 15,wherein each polymer is analyzed separately.
 23. A kit for labelingpolymers, comprising: a container housing a series of distinct nucleicacid probes; wherein the series of nucleic acid probes is a set ofmultiple base pair probes.
 24. The kit of claim 23, wherein the multiplebase pair probes are selected from the group consisting of two base pairprobes, three base pair probes, four base pair probes, and five basepair probes.
 25. The kit of claim 23, wherein the container is a singlecontainer having a plurality of compartments, each housing a specificlabeled probe.
 26. The kit of claim 23, wherein the container is aplurality of containers each containing a multiple base pair probehaving a different sequence.
 27. The kit of claim 23, further comprisinginstructions for labeling the nucleic acid probes.
 28. The kit of claim23, wherein the distinct nucleic acid probes are labeled.
 29. The kit ofclaim 23, wherein the distinct nucleic acid probes are two base pairprobes.
 30. The kit of claim 23, wherein the distinct nucleic acidprobes are three base pair probes.
 31. The kit of claim 23, wherein thedistinct nucleic acid probes are four base pair probes.
 32. The kit ofclaim 23, wherein the distinct nucleic acid probes are five base pairprobes.
 33. A method for analyzing a polymer, comprising linearly movinga labeled polymer with respect to a fixed station, obtaining a signalfrom the labeled polymer as the labeled polymer passes the fixedstation, wherein the signal is an electromagnetic radiation signalarising from an interaction between at least two distinct labeled unitspecific markers, determining a quantitative measure of intensity of thesignal to analyze the polymer.
 34. The method of claim 33, wherein eachunit of the labeled polymer is labeled and the quantitative measure ofintensity of the signal indicates the length of the polymer.
 35. Themethod of claim 33, wherein less than all units of the polymer arelabeled with at least one unit specific marker and the quantitativemeasure of intensity of the signal indicates the number of labeled unitspecific markers present in the polymer.
 36. The method of claim 33,wherein the fixed station is an electromagnetic radiation source. 37.The method of claim 33, wherein the fixed station is a radiation source.38. The method of claim 33, wherein a plurality of polymers are analyzedsimultaneously to produce a plurality of signals, one signal for eachpolymer, and further comprising the step of comparing the intensities ofthe signals to analyze the polymers.
 39. The method of claim 33, whereina plurality of polymers are analyzed simultaneously to produce a signal.40. The method of claim 39, wherein the number of polymers is known andwherein each of the polymers is “identically labeled and furthercomprising the step of dividing the intensity labeled and furthercomprising the step of dividing the intensity by the number of polymersto determine the number of labeled unit specific markers in eachpolymer.
 41. The method of claim 33, wherein the units are labeled witha peptide nucleic acid probe.
 42. The method of claim 33, wherein theunits are labeled with a series of distinct nucleic acid probes selectedfrom the group consisting of two base pair probes, three base pairprobes, four base pair probes, and five base pair probes.
 43. The methodof claim 33, wherein the units are labeled with a fluorescent probe. 44.The method of claim 33, wherein the labeled polymer is labeled with aplurality of unit specific markers, wherein at least one unit specificmarker includes a fluorophore which emits light at a first wavelengthand at least one unit specific marker which includes a fluorophore whichemits light at a second wavelength
 45. The method of claim 44 whereinthe at least one unit specific marker which includes the fluorophorewhich emits light at the first wavelength is attached to end units ofthe polymer and wherein the at least one unit specific marker whichincludes the fluorophore which emits light at the second wavelength isattached to an internal unit of the polymer.